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ABSTRACT 

As  networks  become  more  distributed,  the  bulk  of  the  communication  is  being  carried  out  between  end-users  or 
devices.  The  distributed  nature  of  the  communication  poses  novel  challenges  for  efficient  operation  of  such  networks 
and  requires  design  considerations  that  are  fundamentally  different  from  those  of  classical  point-to-point 
communication  systems.  This  thesis  studies  two  such  design  issues,  (1)  delay  management  and  (2)  security,  and 
attempts  to  understand  the  information-theoretic  limits  of  distributed  communication  with  regard  to  these  issues. 

First,  the  tradeoff  between  delay  and  partial  reconstruction  in  peer-to-peer  (P2P)  networks  is  studied,  i.e.,  the  number 
of  messages  a  peer  must  obtain  to  reconstruct  a  given  fraction  of  the  data.  Using  a  binary  erasure  version  of  the 
multiple  descriptions  (MD)  problem  to  model  the  P2P  network,  the  thesis  presents  coding  schemes  based  on 
systematic  MDS  (maximum  distance  separable)  codes  and  random  binning  strategies  that  achieve  a  Pareto  optimal 
delayreconstruction  tradeoff.  The  erasure  MD  setup  is  then  used  to  propose  a  layered  coding  framework  for  MD, 
which  is  then  applied  to  vector  Gaussian  MD  and  shown  to  be  optimal  for  symmetric  scalar  Gaussian  MD  with  two 
levels  of  receivers  and  no  excess  rate  at  the  central  receiver. 

Second,  delay-reconstruction  tradeoffs  are  studied  for  a  more  decentralized  network  in  which  peers  are  allowed  to 
encode  and  generate  their  own  messages  based  on  their  current  partial  knowledge  of  the  file,  and  a  coding  scheme 
based  on  erasure  compression  and  Slepian-Wolf  binning  is  presented.  The  coding  scheme  is  shown  to  provide  a 
Pareto  optimal  delay-reconstruction  tradeoff  for  the  case  of  symmetric  peers  (i.e.,  each  peer  generates  packets  of  the 
same  rate).  In  the  process  of  characterizing  the  aforementioned  tradeoff,  an  improved  outer  bound  on  the  rate  region 
of  the  general  multi-terminal  source  coding  problem  from  information  theory  is  also  established.  It  is  further  shown 
that  in  the  case  of  asymmetric  peers,  the  aforementioned  coding  scheme  is  not  optimal. 

Third,  lossy  compression  is  studied  from  the  viewpoint  of  security.  An  adversarial  lossy  source  coding  problem  is 
considered  in  which  a  source  is  encoded  into  n  packets,  any  t  of  which  may  be  altered  in  an  arbitrary  way  by 
Byzantine  adversaries.  The  decoder  receives  the  n  packets  and,  without  knowing  which  packets  were  altered,  seeks  to 
reconstruct  the  original  source  to  meet  a  distortion  constraint.  A  layered  architecture  for  this  problem  is  examined, 
which  separates  lossy  compression  from  coding  for  adversarial  errors.  This  architecture  is  shown  to  be  optimal  for 
binary  sources  with  Hamming  distortion  and  Gaussian  sources  with  quadratic  distortion,  yet  suboptimal  in  general. 

Finally,  an  adversarial  n-encoder  lossless  source  coding  problem  with  multiple  sources  is  considered  in  which  the 
number  of  packets  corrupted  by  adversaries  is  unknown  to  the  honest  entities  in  the  network.  It  is  shown  that  this 
problem  is  equivalent  to  an  instance  of  the  symmetric  MLD  (multi-level  diversity)  coding  problem  with  n  sources 
and  n  encoders,  in  which  there  are  no  adversaries  but  the  decoder  may  receive  only  a  subset  of  the  n  messages  and 
reconstructs  a  subset  of  the  n  sources. 
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FUNDAMENTAL  LIMITS  OF  DELAY  AND  SECURITY  IN 
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As  networks  become  more  distributed,  the  bulk  of  the  communication  is  being 
carried  out  between  end-users  or  devices.  The  distributed  nature  of  the  com¬ 
munication  poses  novel  challenges  for  efficient  operation  of  such  networks  and 
requires  design  considerations  that  are  fundamentally  different  from  those  of 
classical  point-to-point  communication  systems.  This  thesis  studies  two  such 
design  issues,  (1)  delay  management  and  (2)  security,  and  attempts  to  under¬ 
stand  the  information-theoretic  limits  of  distributed  communication  with  re¬ 
gard  to  these  issues. 

First,  the  tradeoff  between  delay  and  partial  reconstruction  in  peer-to-peer 
(P2P)  networks  is  studied,  i.e.,  the  number  of  messages  a  peer  must  obtain 
to  reconstruct  a  given  fraction  of  the  data.  Using  a  binary  erasure  version  of 
the  multiple  descriptions  (MD)  problem  to  model  the  P2P  network,  the  thesis 
presents  coding  schemes  based  on  systematic  MDS  (maximum  distance  separa¬ 
ble)  codes  and  random  binning  strategies  that  achieve  a  Pareto  optimal  delay- 
reconstruction  tradeoff.  The  erasure  MD  setup  is  then  used  to  propose  a  layered 
coding  framework  for  MD,  which  is  then  applied  to  vector  Gaussian  MD  and 
shown  to  be  optimal  for  symmetric  scalar  Gaussian  MD  with  two  levels  of  re¬ 
ceivers  and  no  excess  rate  at  the  central  receiver. 

Second,  delay-reconstruction  tradeoffs  are  studied  for  a  more  decentralized 
network  in  which  peers  are  allowed  to  encode  and  generate  their  own  mes- 
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sages  based  on  their  current  partial  knowledge  of  the  file,  and  a  coding  scheme 
based  on  erasure  compression  and  Slepian-Wolf  binning  is  presented.  The  cod¬ 
ing  scheme  is  shown  to  provide  a  Pareto  optimal  delay-reconstruction  tradeoff 
for  the  case  of  symmetric  peers  (i.e.,  each  peer  generates  packets  of  the  same 
rate).  In  the  process  of  characterizing  the  aforementioned  tradeoff,  an  improved 
outer  bound  on  the  rate  region  of  the  general  multi-terminal  source  coding  prob¬ 
lem  from  information  theory  is  also  established.  It  is  further  shown  that  in  the 
case  of  asymmetric  peers,  the  aforementioned  coding  scheme  is  not  optimal. 

Third,  lossy  compression  is  studied  from  the  viewpoint  of  security.  An  ad¬ 
versarial  lossy  source  coding  problem  is  considered  in  which  a  source  is  en¬ 
coded  into  n  packets,  any  t  of  which  may  be  altered  in  an  arbitrary  way  by 
Byzantine  adversaries.  The  decoder  receives  the  n  packets  and,  without  know¬ 
ing  which  packets  were  altered,  seeks  to  reconstruct  the  original  source  to  meet 
a  distortion  constraint.  A  layered  architecture  for  this  problem  is  examined, 
which  separates  lossy  compression  from  coding  for  adversarial  errors.  This  ar¬ 
chitecture  is  shown  to  be  optimal  for  binary  sources  with  Hamming  distortion 
and  Gaussian  sources  with  quadratic  distortion,  yet  suboptimal  in  general. 

Finally,  an  adversarial  n-encoder  lossless  source  coding  problem  with  mul¬ 
tiple  sources  is  considered  in  which  the  number  of  packets  corrupted  by  ad¬ 
versaries  is  unknown  to  the  honest  entities  in  the  network.  It  is  shown  that 
this  problem  is  equivalent  to  an  instance  of  the  symmetric  MLD  (multi-level  di¬ 
versity)  coding  problem  with  n  sources  and  n  encoders,  in  which  there  are  no 
adversaries  but  the  decoder  may  receive  only  a  subset  of  the  n  messages  and 
reconstructs  a  subset  of  the  n  sources. 
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CHAPTER  1 

INTRODUCTION 

Modern-day  networks  are  constantly  and  rapidly  growing  in  size.  Not  only 
an  appreciably  larger  amount  of  data  is  being  transferred  over  vast  geographic 
distances,  but  users  are  also  expecting  improved  QoS,  leading  to  more  stringent 
requirements  on  delay,  error  rates,  and  dropped  packets.  The  sheer  size  of  the 
networks  and  the  amount  of  traffic  has  led  to  a  push  in  building  distributed 
architectures  to  increase  efficiency  and  reduce  cost.  Distributed  systems  pro¬ 
vide  a  number  of  advantages  over  centralized  systems;  they  are,  for  instance, 
more  scalable  as  the  number  of  users  grows  and  require  only  partial  knowl¬ 
edge  of  the  network.  However,  while  centralized  point-to-point  communication 
has  been  relatively  well-studied  and  its  fundamental  limits  well-understood, 
we  still  lack  an  understanding  of  many  fundamental  problems  in  decentralized 
communication.  For  instance,  how  do  we  best  allocate  network  resources  in 
distributed /cloud  storage  systems?  How  do  sensors  in  a  distributed  sensor 
network  communicate  efficiently  while  meeting  power  constraints?  What  are 
the  communication  requirements  to  meet  performance  and  QoS  guarantees  in 
decentralized  networks  and  how  are  they  different  from  those  in  centralized 
networks?  What  security  and  privacy  issues  can  arise  in  distributed  systems 
and  how  do  they  affect  communication?  In  this  thesis,  we  focus  on  two  such 
design  issues,  delay  and  security,  and  attempt  to  understand  the  information- 
theoretic  limits  of  communication  with  regard  to  these  issues. 
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1.1  Delay-reconstruction  Tradeoffs  in  Content-sharing  Net¬ 
works 

Typically,  content  is  distributed  from  servers  to  clients  via  transmission  of  pack¬ 
ets  over  a  network.  For  the  purposes  of  sharing  content,  e.g.,  a  file,  participants 
can  act  as  both  server  and  client  by  both  uploading  and  downloading  packets  to 
and  from  other  participants,  as  is  the  case  in  gossip  communication  or  peer-to- 
peer  architectures  (e.g.,  [1,  2,  3]).  One  metric  that  is  widely  used  to  measure  the 
performance  of  file  sharing  systems  is  the  average  amount  of  time  a  user  must 
spend  in  the  network,  i.e.,  the  average  time  taken  to  download  the  whole  file. 
There  are  two  schools  of  thought  about  how  to  build  such  a  system;  one  is  to 
divide  the  file  into  a  number  of  pieces  which  are  then  circulated  among  partic¬ 
ipants  without  any  coding.  Participants  therefore  acquire  a  partial  copy  of  the 
file  as  soon  as  they  download  their  first  packet.  Such  a  strategy  is  susceptible 
to  the  coupon  collector  problem;  the  initial  few  packets  can  be  acquired  rapidly, 
but  it  takes  much  longer  to  collect  the  final  few  packets  [4]  which  significantly 
increases  the  overall  download  time  per  participant.  The  delay  performance  of 
BitTorrent,  a  prominent  P2P  architecture  based  on  this  school  of  thought,  has 
been  thoroughly  analyzed  [5]-[8]. 

A  competing  school  of  thought  is  to  first  encode  file  pieces  using  random  lin¬ 
ear  network  coding  [9]  or  rateless  fountain  codes.  P2P  protocols  based  on  foun¬ 
tain  codes  have  been  considered  in  [11,  18],  and  random  linear  network  cod¬ 
ing  has  been  employed  in  P2P  technology  in  [12]-[15].  The  advent  of  fountain 
codes  [16]-[19]  has  been  one  of  the  most  important  recent  advances  in  coding 
for  packet  networks.  Fountain  codes  operate  by  generating  a  virtually  infinite 
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number  of  encoded  packets  such  that  the  original  source  can  be  reconstructed 
from  any  sufficiently  large  subset  of  these  packets.  Fountain  codes  are  capacity 
achieving  and  universal  for  the  class  of  binary  erasure  channels,  and  they  have 
the  additional  advantage  of  being  rateless  in  the  sense  that  the  number  of  pack¬ 
ets  to  be  produced  at  the  encoder  can  be  decided  in  real  time.  Fountain  codes 
also  obviate  the  need  for  feedback  from  the  receiver  to  the  transmitter,  and  they 
have  low  encoding  and  decoding  complexity. 

Fountain  codes  can  eliminate  the  coupon  collector  problem  [10,  Sec.  2],  but 
they  suffer  from  poor  intermediate  performance.  In  the  extreme  case,  it  is  not 
possible  to  reconstruct  any  portion  of  the  original  source  until  all  of  it  can  be  re¬ 
constructed.  In  contrast,  for  feedback-based  retransmission  schemes  for  the  era¬ 
sure  channel,  each  received  packet  reveals  some  of  the  original  source.  A  user- 
perceived  delay  is  therefore  introduced  with  fountain  codes;  if,  for  instance, 
users  are  downloading  a  movie,  then  they  must  wait  for  all  of  the  movie  to  be 
downloaded  before  they  can  begin  watching  it. 

A  fundamental  question  that  arises  is  whether  it  is  possible  to  mitigate  this 
user-perceived  delay  via  partial  reconstruction  of  the  source  without  increas¬ 
ing  the  overall  transmission  time.  In  particular,  assuming  that  the  code  remains 
capacity  achieving  over  erasure  channels,  how  much  of  the  source  can  be  re¬ 
constructed  from  a  given  number  of  received  encoded  packets?  We  distinguish 
between  two  types  of  partial  reconstruction:  "in-order"  reconstruction  refers  to 
sequential  reconstruction  in  which  earlier  parts  of  the  source  are  reconstructed 
before  the  latter  parts.  While  many  network  applications  require  in-order  recon¬ 
struction,  "out-of-order"  reconstruction  may  be  sufficient  for  others.  It  is  suffi¬ 
cient  for  files  that  are  not  organized  linearly,  for  example,  such  as  an  unsorted 
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database.  Videos  with  out-of-order  reconstruction  could  be  played  by  interpo¬ 
lating  over  the  missing  portions.  This  playback  would  be  at  a  lower  quality,  but 
it  could  still  be  useful,  say  for  determining  whether  the  downloaded  file  is  the 
desired  one. 

Methods  for  improving  the  intermediate  performance  of  fountain  codes, 
assuming  out-of-order  reconstruction,  have  been  investigated  in  [20]-[22]  in  a 
coding-theoretic  setting.  The  source  is  encoded  into  a  large  number  of  packets 
such  that  any  k  suffice  for  reconstructing  the  source.  Intermediate  performance 
is  then  characterized  by  the  fraction  of  the  original  source  string  that  can  be 
reconstructed  when  m  encoded  packets  are  received,  where  0  <  m  <  k.  An 
upper  bound  on  this  fraction  is  provided  in  [20],  and  lower  bounds,  based  on 
the  designing  of  suitable  output  degree  distributions  for  various  values  of  m, 
that  perform  close  to  the  upper  bound  are  provided  in  [20]-[22],  This  enhanced 
intermediate  performance  comes  at  the  cost  of  an  increased  overall  transmis¬ 
sion  time,  however,  i.e.,  the  codes  are  no  longer  capacity  achieving.  Moreover, 
as  mentioned  in  [22],  designing  degree  distributions  to  boost  intermediate  per¬ 
formance  for  a  particular  value  of  m  exacerbates  intermediate  performance  for 
other  values  of  m. 

In  this  work,  we  take  a  more  information-theoretic  approach  and  address 
the  issue  of  optimal  partial  reconstruction  without  increasing  the  overall  trans¬ 
mission  time.  We  model  the  source  as  a  bit  string  that  is  encoded  into  n  packets. 
We  impose  the  constraint  that  the  receiver  be  able  to  reconstruct  a  fraction  1  —  D 
of  the  source  from  any  k  packets,  and  we  require  that  the  sum  rate  of  these 
packets  equal  the  minimum  rate  for  which  this  is  possible.  We  then  ask  what 
fraction  of  the  source  block  can  be  reconstructed  from  m  packets,  where  m  <  k, 
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allowing  for  out-of-order  reconstruction.  For  this  setup,  we  provide  a  coding 
scheme  based  on  MDS  codes  that  yields  significant  partial  reconstruction  while 
meeting  the  aforementioned  constraints,  and  is  provably  Pareto  optimal  in  m 
for  1  <  m  <  k  and  any  n  and  k,  and  absolutely  optimal  for  certain  values  of  m, 
n  and  k. 

Our  source  coding  problem  can  be  viewed  as  a  binary  erasure  version  of  the 
multiple  descriptions  (MD)  problem  [23]-[32].  Multiple  description  coding  is  a 
technique  in  which  a  source  is  encoded  into  several  messages  that  are  sent  to 
the  decoder,  only  a  subset  of  which  are  assumed  to  reach  their  destination.  The 
decoder  uses  them  to  reproduce  the  source,  with  the  fidelity  of  the  reproduction 
depending  on  which  packets  are  received.  The  problem  considered  in  this  work 
amounts  to  an  MD  problem  with  distortion  measured  using  the  erasure  distortion 
measure  [33,  p.  338]:  the  decoder's  reproduction  of  the  source  may  contain  era¬ 
sures  but  not  errors,  and  the  fraction  of  erasures  in  the  reproduction  is  defined 
to  be  its  "distortion"  with  respect  to  the  original.  In  the  terminology  of  multiple 
descriptions,  our  rate  constraint  is  called  a  "no  excess  rate"  condition  [25]. 

It  is  worth  noting  that  the  erasure  version  of  the  MD  problem  has  some 
unique  virtues.  The  erasure  distortion  measure  is  universal  in  that  it  can  be 
reasonably  employed  for  a  wide  array  of  digital  data  sources.  This  sidesteps 
the  difficult  question  of  how  to  measure  distortion  for  complicated,  real-world 
data  sources  such  as  video.  The  binary  erasure  MD  problem  with  no  excess 
rate  and  no  distortion  for  every  k  out  of  n  messages  is  particularly  relevant  to 
peer-to-peer  networks,  since  it  can  be  used  to  study  the  tradeoff  between  the 
performance  of  fountain  codes  and  a  competing  technology:  BitTorrent  [3].  For 
large  n  and  small  k,  our  MD  problem  mimics  rateless  fountain  codes,  since  out 
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of  a  large  number  of  descriptions,  only  a  relatively  small  number  must  be  re¬ 
ceived  (collected)  in  order  to  reconstruct  the  source  with  a  specified  distortion. 
For  k  =  n,  the  MD  problem  resembles  BitTorrent,  where  all  of  the  relevant  pack¬ 
ets  must  be  received  to  allow  for  complete  reconstruction  of  the  source.  Bit- 
Torrent  provides  good  intermediate  performance  but  suffers  from  the  "coupon 
collector"  problem:  the  initial  packets  can  be  acquired  quickly  at  the  receiver, 
but  it  takes  much  longer  to  obtain  the  last  few  packets.  By  varying  n  and  k  in 
the  binary  erasure  MD  model,  the  middle  ground  between  fountain  codes  and 
BitTorrent  can  be  explored.  Our  results  suggest  that  choosing  n  to  be  an  integer 
multiple  of  k  would  provide  some  of  the  advantages  of  both  technologies. 

The  erasure  MD  problem  could  also  serve  as  a  starting  point  for  the  design 
of  practical  codes  for  network  rate  distortion.  In  the  theoretical  development 
of  modern  channel  codes  such  as  LDPC,  many  of  the  code  designs  and  per¬ 
formance  characterizations  were  first  established  for  the  erasure  channel  [34]. 
Finally,  the  erasure  MD  problem  yields  results  that  are  more  positive  in  nature 
than  those  of  other  MD  instances.  In  particular,  for  many  sources,  the  no  excess 
rate  assumption  necessarily  yields  poor  intermediate  performance  (e.g.,  [24]):  if 
a  coding  scheme  is  near-optimal  for  k  receptions,  it  often  yields  high  distortion 
for  m  <  k  receptions.  For  the  binary  erasure  MD  problem,  however,  we  shall 
see  that  it  is  possible  to  obtain  good  intermediate  performance  under  no  excess 
rate. 
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1.1.1  Results 

We  shall  henceforth  use  the  terms  packets,  messages,  and  descriptions  inter¬ 
changeably.  We  focus  on  binary  erasure  MD  with  no  excess  rate  for  every  k  out 
of  n  descriptions,  i.e.,  any  subset  consisting  of  k  messages,  has  a  total  rate  of 
R{Dk),  where  Dk  is  the  distortion  constraint  when  k  messages  are  received.  We 
consider  symmetric  descriptions,  i.e.,  the  rates  of  the  n  descriptions  are  the  same 
and  the  distortion  constraint  depends  only  on  the  number  of  messages  received. 
In  fact,  no  excess  rate  implies  symmetric  descriptions  for  k  <  n:  if  every  k  out 
of  n  descriptions  have  sum  rate  R(Dk),  then  each  rate  must  be  R(Dk)/k.  We  ex¬ 
amine  two  distortion  criteria;  a  worst-case  distortion  criterion,  which  measures 
the  reconstruction  fidelity  by  the  maximum  of  the  per-letter  distortion  over  all 
source  sequences,  and  an  average-case  distortion  criterion,  which  measures  the 
reconstruction  fidelity  by  the  average  of  the  per-letter  distortion  over  all  source 
sequences.  The  average-case  criterion  is  the  standard  criterion  used  in  the  liter¬ 
ature.  The  worst-case  criterion  is  less  commonly  used  but  it  has  the  advantage 
of  being  universal  in  the  sense  that  it  is  insensitive  to  the  source  distribution, 
which  in  practice  is  often  unknown.  Our  main  contributions  are: 

1.  proposing,  for  all  n  and  k,  coding  schemes  for  both  worst-case  and 
average-case  distortion  criteria  and  characterizing  their  achievable  distor¬ 
tion  region  when  m  <  k  descriptions  are  received  at  the  decoder.  The 
scheme  for  worst-case  distortion  is  a  zero-error  coding  scheme  based  on 
MDS  ( maximum  distance  separable )  codes.  The  scheme  for  average-case  dis¬ 
tortion  is  based  on  random  binning  and  can  be  viewed  as  a  concatenation 
of  (n,  1)  and  (n,  k )  source-channel  erasure  codes  [29]. 

2.  providing,  for  both  worst-case  and  average-case  distortion  criteria,  a  tight 
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lower  bound  on  the  distortion  when  a  single  message  is  received  at  the 
decoder.  For  worst-case  distortion,  the  lower  bound  holds  for  all  n  and  k. 
Moreover,  we  show  that  the  MDS  coding  scheme  is  Pareto  optimal  in  the 
achievable  distortions  Di, ...  ,Dk  for  all  n  and  k,  and,  for  certain  ranges 
of  n  and  k,  is  also  absolutely  optimal  when  more  than  one  message  is  re¬ 
ceived  at  the  decoder.  For  average-case  distortion,  our  lower  bound  holds, 
modulo  a  closure  operation,  for  all  n  and  k  satisfying  (l  —  <  In  ad¬ 

dition,  for  n  >  3  and  k  =  2,  we  provide  a  lower  bound  on  the  optimal 
single-message  distortion  that  differs  by  exactly  1  jn  from  the  distortion 
achieved  by  the  random  binning  scheme.  Our  results  for  the  special  case 
in  which  there  is  no  distortion  for  k  messages  (i.e.,  any  k  messages  allow 
the  decoder  to  construct  the  original  source  sequence  completely)  have 
appeared  in  [35]  (average-case  distortion)  and  [36]  (worst-case  distortion). 

3.  proposing  a  coding  scheme,  based  on  the  binary  erasure  MD  coding 
schemes,  for  vector  Gaussian  MD  and  showing  that  it  is  optimal  for  scalar 
Gaussian  MD  with  two  levels  of  receivers  and  no  excess  rate  for  the  cen¬ 
tral  receiver.  The  scheme  involves  quantizing  the  vector  Gaussian  source 
according  to  a  given  quadratic  distortion  constraint  and  then  transmit¬ 
ting  the  quantized  version  over  the  n  channels  according  to  the  aforemen¬ 
tioned  binary  erasure  coding  schemes.  This  demonstrates  how  the  binary 
erasure  coding  schemes  can  be  used  as  part  of  a  more  general,  layered 
coding  scheme  for  multiple  descriptions  with  a  generic  source  distribu¬ 
tion  and  arbitrary  distortion  metric. 
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1.1.2  In-Order  Reconstruction 

Several  prior  works  have  considered  the  problem  with  in-order  reconstruction. 
Albanese  et  al.  [38]  propose  a  coding  method  that  involves  assigning  a  prior¬ 
ity  level  to  messages  and  encoding  them  into  packets.  A  message  can  be  de¬ 
coded  from  any  subset  of  packets;  however,  the  priority  level  of  a  message  de¬ 
termines  the  minimum  number  of  packets  required  to  reconstruct  that  message. 
This  amounts  to  in-order  reconstruction,  because  messages  with  higher  priority 
must  be  reconstructed  before  messages  with  lower  priority.  The  in-order  recon¬ 
struction  problem  can  also  be  viewed  as  an  instance  of  symmetric  multilevel 
diversity  coding  (MLD)  [39].  Comparing  these  results  with  those  in  this  work 
shows  that  guaranteeing  in-order  reconstruction  requires  significantly  higher 
rates.  Walsh  et  al.  [40]  study  the  rate-delay  tradeoff  for  in-order  reconstruction 
in  multi-path  networks  where  time-ordered  source  packets  arrive  out  of  order 
at  the  destination.  The  channel  between  the  transmitter  and  receiver  is  there¬ 
fore  different  from  the  packet  erasure  channel  considered  here,  since  any  packet 
sent  by  the  transmitter  eventually  arrives  at  the  receiver,  albeit  not  in  the  order 
it  was  transmitted  in.  The  authors  introduce  delay  mitigating  codes  with  the  aim 
of  minimizing  delay  at  the  receiver  when  the  source  bits  are  reconstructed  in 
order  from  encoded  packets  arriving  out  of  order. 

1.2  Decentralized  Encoding 

We  next  focus  on  delay-reconstruction  tradeoffs  in  P2P  networks  with  decen¬ 
tralized  encoding,  i.e.,  peers  generate  coded  packets  based  on  their  own  par¬ 
tial  copies  of  the  file.  Within  this  context,  we  address  the  question  posed  in 
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Section  1.1:  if  we  assume  optimal  decentralized  encoding  and  that  the  packets 
might  be  received  in  any  order,  then  how  much  of  the  file  can  be  reconstructed 
from  a  given  number  of  received  packets? 

Since  the  centralized  version  of  the  problem  was  addressed  by  posing  it  as  a 
multiple  description  problem,  it  is  natural  to  study  the  decentralized  version 
by  posing  it  as  an  instance  of  multiple  descriptions  with  distributed  encod¬ 
ing,  which  in  the  literature  is  actually  called  the  robust  CEO  problem  [47],  In 
the  CEO  problem,  n  encoders  observe  independently  corrupted  versions  of  a 
source  and  then  transmit  encoded  messages,  based  on  their  partial  knowledge 
of  the  source,  to  a  decoder  that  attempts  to  reconstruct  the  source  from  the  n 
messages  to  meet  a  distortion  constraint.  There  is  no  communication  among 
the  encoders,  as  shown  in  Figure  3.1.  In  the  robust  variant  of  the  problem,  the 
encoders  behave  as  in  the  CEO  problem,  but  instead  of  using  all  n  messages  to 
reconstruct  the  source,  the  decoder  must  reconstruct  it  from  any  subset  of  the  n 
messages  subject  to  different  distortion  constraints  for  each  subset. 

We  employ  a  particular  instance  of  the  robust  CEO  problem  that  we  call  the 
binary  erasure  robust  CEO  problem.  In  this  instance,  the  source  to  be  commu¬ 
nicated  is  binary  and  i.i.d.  uniform.  The  encoders  observe  this  binary  source 
passed  through  independent  binary  erasure  channels.  Thus,  some  of  each  en¬ 
coder's  file  is  missing,  but  none  of  it  is  incorrect.  Moreover,  when  the  decoder 
reconstructs  the  file,  it  is  not  permitted  to  introduce  errors,  although  it  is  allowed 
to  output  an  erasure  for  any  source  bit  about  which  it  is  uncertain.  The  "distor¬ 
tion"  is  the  fraction  of  erasures  in  its  reconstruction.  In  turn,  the  decoder  could 
then  create  new  coded  packets  from  its  reconstruction  and  distribute  them  to 
other  peers.  Although  we  focus  on  the  case  in  which  the  source  is  binary,  we 
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expect  that  most  of  the  analysis  will  carry  over  to  uniform  sources  over  larger 
alphabets,  which  could  be  used  to  model  audio  samples,  transform  coefficients, 
video  frames,  or  BitTorrent  pieces. 

The  binary  erasure  robust  CEO  problem  lends  itself  to  a  natural  coding 
scheme  in  which  individual  encoders  (peers)  perform  vector  quantization  of 
their  observed  partial  source  sequences  using  erasure  test  channels,  followed 
by  Slepian-Wolf  binning.  This  is  a  particularization  of  the  general  scheme  pro¬ 
posed  in  [47],  We  first  consider  the  case  of  symmetric  peers  and  we  show,  using 
very  different  techniques  from  those  used  in  the  centralized  case  [36],  that  this 
coding  scheme  achieves  a  delay-reconstruction  tradeoff  that  is  Pareto  optimal 
over  a  range  of  received  messages.  The  same  problem  for  Gaussian  sources 
and  quadratic  distortion  measure  has  been  considered  in  [48]  and  an  achievable 
information-theoretic  rate  region  has  been  derived.  Optimality  results  for  the 
symmetric  case  of  the  Gaussian  problem  have  been  presented  in  [49]. 

In  the  process  of  proving  our  result,  we  also  establish  a  new  outer  bound  for 
the  general  multi-terminal  source  coding  problem  that  improves  upon  the  outer 
bound  of  Wagner  and  Anantharam  [44],  We  further  show  that  if  we  relax  the 
symmetry  assumptions  about  the  encoders,  then  the  coding  scheme  is  no  longer 
optimal,  even  for  a  simple  setup  with  two  encoders. 


1.3  Lossy  Source  Coding  with  Byzantine  Adversaries 

While  the  rapid  growth  of  modern-day  communication  networks  makes  them 
increasingly  useful,  it  also  makes  them  increasingly  difficult  to  protect  against 
attacks.  This  is  especially  true  of  those  networks,  such  as  peer-to-peer  systems. 
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in  which  the  nodes  are  controlled  by  different  entities.  In  the  case  of  peer-to- 
peer  networks,  malicious  users  could  sabotage  the  file-sharing  process  by  in¬ 
tentionally  transmitting  a  corrupted  version  of  the  file.  Similar  problems  can 
potentially  arise  in  ad-hoc  networks  and  distributed  storage  systems. 

There  has  been  considerable  work  on  how  to  protect  transmitted  information 
against  malicious  users  within  the  context  of  channel-  and  network-coding,  and 
a  number  of  significant  results  are  available.  Yeung  and  Cai  [53]  show  that  if 
z  unit-capacity  edges  in  an  acyclic  multicast  network  are  subject  to  random  or 
adversarial  errors,  then  the  network  capacity  is  C  —  2 z,  where  C  is  the  network 
capacity  when  all  edges  are  error-free.  Thus  if  an  adversary  controls  z  edges,  it 
effectively  removes  2z  edges  from  the  original  adversary-free  network  (see  also 
[54]- [59]  and  the  references  therein).  This  is  reminiscent  of  the  Singleton  bound, 
and  we  refer  to  it  as  the  "factor-of-2"  rule.  The  factor-of-2  rule  was  also  shown 
to  hold  for  lossless  source  coding:  it  is  well  known  that  if  a  source  X  is  to  be 
losslessly  communicated  via  n  packets,  then  the  sum  rate  of  those  packets  must 
be  at  least  H{ X).  Kosut  and  Tong  [60]  have  shown  that  if  t  of  the  n  packets  can 
be  altered  in  arbitrary  ways  by  adversaries,  then  every  n  —  21  packets  must  have 
sum  rate  at  least  H(X).  Thus  t  traitors  effectively  remove  2/  packets  from  the 
original  adversary-free  problem,  i.e.,  the  factor-of-2  rule  obtains. 

In  the  context  of  peer-to-peer  systems,  often  the  ultimate  goal  is  to  commu¬ 
nicate  a  file  approximately  rather  than  reliably.  Codes  and  fundamental  lim¬ 
its  for  this  problem  are  less  well  understood  (but  see  [61]-[62]).  One  natural 
approach  to  this  problem  is  to  perform  separate  compression  and  adversarial 
error-protection.  That  is,  one  combines  rate-distortion-optimal  lossy  compres¬ 
sion  with  network  codes  that  are  optimal  for  the  adversarial  model  at  hand. 
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We  show  that  this  approach  is  optimal  in  some  cases  but  suboptimal  in 
general,  even  for  networks  with  one  sender,  one  receiver,  and  no  intermediate 
nodes.  Specifically,  we  consider  the  problem  in  which  a  source  is  compressed 
to  form  n  packets,  any  t  of  which  can  be  altered  in  an  arbitrary  way  The  de¬ 
coder  receives  the  n  packets  and,  without  knowing  which  packets  were  altered, 
must  estimate  the  source  to  meet  a  given  distortion  constraint.  We  show  that 
separate  compression  and  adversarial  error  correction  achieve  rate-distortion 
performance  governed  by  the  factor-of-2  rule,  and  that  this  is  optimal  for  bi¬ 
nary  sources  with  the  Hamming  distortion  measure  and  Gaussian  sources  with 
the  mean  square  error  distortion  measure.  These  two  optimality  results  hinge 
on  a  combinatorial  result  of  Kleitman  [66]  on  the  maximum  size  of  subsets  of 
Hamming  space  with  a  given  diameter,  and  the  Brunn-Minkowski  inequality, 
respectively. 

We  then  show  by  means  of  a  counterexample,  involving  a  binary  source  with 
erasure  distortion,  that  separation  is  not  optimal  in  general.  We  consider  a  3- 
encoder  problem  with  one  traitor  such  that  one  encoder  has  rate  R  <  1,  while 
the  other  two  have  rate  1  and  can  therefore  transmit  the  source  sequence  exactly. 
We  determine  the  optimal  distortion  for  this  problem  as  a  function  of  R  and 
show  that  separation  cannot  achieve  it.  We  note  that  while  source-channel  sep¬ 
aration  has  long  been  known  known  to  fail  in  many  scenarios  (e.g.,  [63,  64,  65]), 
the  reason  that  it  fails  here  seems  to  be  fundamentally  different  from  the  stan¬ 
dard  examples. 
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1.4  Organization  of  the  Thesis 

Chapter  2:  We  study  delay-reconstruction  tradeoffs  in  P2P  networks.  We  for¬ 
mulate  the  n-channel  binary  erasure  MD  problem  in  Section  2.1.  Sections  2.2 
and  2.3  are  devoted  to  our  results  for  worst-case  distortion  and  average-case 
distortion,  respectively.  In  Section  2.4,  we  describe  the  layered  architecture  for 
MD  and  present  our  results  for  vector  Gaussian  multiple  descriptions. 

Chapter  3:  We  study  delay-reconstruction  tradeoffs  in  P2P  networks  with  de¬ 
centralized  encoding.  In  Section  3.1,  we  formulate  the  binary  erasure  robust 
CEO  problem  more  precisely  and  describe  our  coding  scheme.  In  Section  3.2 
we  consider  the  symmetric  version  of  the  binary  erasure  robust  CEO  and  show 
that  the  above  coding  scheme  provides  a  Pareto  optimal  delay-reconstruction 
tradeoff.  In  Section  3.3,  we  consider  an  asymmetric,  two  encoder  version  of  the 
problem  and  show  that  the  coding  scheme  is  not  optimal. 

Chapter  4:  We  formulate  the  lossy  source  coding  problem  with  Byzantine  ad¬ 
versaries.  In  Section  4.2,  we  present  the  separation-based  coding  scheme  for 
general  sources  and  arbitrary  distortion  measures  and  show  that  it  achieves  the 
factor-of-2  rule.  In  Sections  4.3  and  4.4,  respectively,  we  prove  that  our  scheme 
is  optimal  for  uniform  binary  sources  with  Flamming  distortion  and  Gaussian 
sources  with  squared  error  distortion.  In  Section  4.5,  we  show  that  the  factor- 
of-2  rule  is  pessimistic  for  binary  sources  and  erasure  distortion. 

Chapter  ??:  We  study  lossless  source  coding  with  multiple  sources  and  an  un¬ 
known  number  of  adversaries  and  show  its  equivalence  to  the  symmetric  MLD 
problem. 
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CHAPTER  2 

ERASURE  MULTIPLE  DESCRIPTIONS 

2.1  Notation  and  Problem  Formulation 

We  use  uppercase  letters  to  denote  random  variables,  bold  letters  to  represent 
vectors  and  script  letters  to  denote  their  ranges.  Realizations  of  random  vari¬ 
ables  are  denoted  by  lowercase  letters,  and  realizations  of  random  vectors  are 
denoted  by  bold  lowercase  letters.  A  superscript  appearing  with  a  vector  (, e.g ., 
X;)  indicates  the  length  of  the  vector.  Matrices  are  also  represented  in  boldface. 
Let  { be  a  memoryless  uniform  binary  source,  with  the  random  variables 
Xt  taking  values  in  the  alphabet  X  =  {+,—}.  Let  X  be  the  reconstruction  space 
{+,— ,0},  where  0  denotes  the  erasure  symbol,  with  an  associated  distortion 

measure  d  :  X  x  X  — y  (0, 1,  oo}  such  that 

/ 

0  if  x  =  x 
d{x,x)  =  i  if  x  =  0 

oo  otherwise. 

The  above  per-letter  measure  is  known  as  the  erasure  distortion  measure  [33, 
p.  338].  An  encoder  is  a  function  fjl>  :  X1  — >■  {1, . . . ,  AT®}.  A  decoder  is  a  function 
9k-  '■  rifcevll’  •  •  •  >  AT®}  — >  X1,  where  /C  is  the  set  of  descriptions  received. 

Let  A/”  =  (1, . . . ,  n}.  The  //-channel  multiple  descriptions  problem,  illustrated 
in  Figure  2.1,  can  be  formulated  as  follows.  There  are  n  encoders.  Encoder 
fjl> ,  i  G  Af,  encodes  and  transmits  a  description  of  a  length-/  source  sequence 
X;  over  channel  i.  The  receiver  either  receives  this  description  without  errors  or 
it  does  not  receive  it  at  all.  Excluding  the  case  where  none  of  the  descriptions  is 
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received,  the  receiver  may  receive  2n  —  1  different  combinations  of  the  n  descrip¬ 
tions.  Thus  it  can  be  represented  by  the  2n  —  1  decoding  functions  g^\  K,  C  J\f , 
/C  7^  0.  Based  on  the  set  of  descriptions  received,  the  receiver  employs  the  cor¬ 
responding  decoding  function  to  output  a  reconstruction  of  the  original  source 
string  subject  to  a  distortion  constraint.  We  consider  symmetric  descriptions, 
i.e.,  each  description  has  the  same  rate  and  the  distortion  constraint  depends 
only  on  the  number  of  descriptions  received. 

We  measure  the  fidelity  of  the  reconstruction  using  two  distortion  criteria:  a 
worst-case  distortion  criterion,  under  which  distortion  is  measured  by  taking  the 
maximum  of  the  per-letter  distortion  over  all  source  sequences,  and  an  average- 
case  distortion  criterion,  under  which  distortion  is  measured  by  taking  the  aver¬ 
age  of  the  per-letter  distortion  over  all  source  sequences.  We  define  achievability 
for  the  two  criteria  as  follows.  Let  :  k  G  /C})  be  the  recon¬ 

struction  sequence  corresponding  to  the  source  sequence  X1. 

Definition  1  (Worst-case  distortion).  The  rate-distortion  vector  ( R ,  Du . . . ,  Dn )  is 
achievable  if  for  some  l  there  exist  encoders  ff  \  i  e  Af  and  decoders  r/^f  /CCA/", 
/C  f  0,  such  that 

R>  y  log  for  all  i,  and 

Dj.  >  max  max 
/C:|/C|=fcx*ev' 

We  use  TZT>worst  to  denote  the  set  of  achievable  rate-distortion  vectors. 

Definition  2  (Average-case  distortion).  The  rate-distortion  vector  ( R ,  Di, . . . ,  Dn ) 
is  achievable  if  for  some  l  there  exist  encoders  ff  \  i  e  Af  and  decoders  g^f  /CCA/", 


/  t= i 
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JC  f  0,  such  that 1 

i?  >  y  log  M-1^  for  all  i,  and 

Dk  >  max  £ 

/C:|/C|=fc 


1  1 

-  Y,d{Xt,XK}t) 

i  i=l 


We  use  VSDavg  to  denote  the  set  of  achievable  rate-distortion  vectors  and 
TZVavg  to  denote  its  closure.  We  describe  our  results  for  worst-case  distortion 
in  the  next  section  and  for  average-case  distortion  in  Section  2.3.  For  both  dis¬ 
tortion  criteria,  we  consider  the  case  where  there  is  no  excess  rate  for  every  k  out 
of  n  descriptions,  i.e.,  kR  =  R(Dk)  =  1  —  Dk,  where  R(-)  is  the  Shannon  rate- 
distortion  function.  Thus  R  —  (1  —  Dk ) / k.  We  shall  henceforth  use  Rk(Dk )  to  de¬ 
note  (1  —  Dk)  / k.  Our  goal  is  to  characterize  the  achievable  distortions  A, . . . ,  Dn 
for  both  distortion  criteria. 


Figure  2.1:  The  n-channel  multiple  descriptions  problem 


It  should  be  pointed  out  that  the  k  —  n  case  is  particularly  simple.  Let 
A:,  i  G  Abe  the  distortion  constraint  when  the  receiver  receives  i  messages.  No 
excess  rate  for  n  descriptions  dictates  that  the  sum-rate  of  the  n  messages  is  ex¬ 
actly  (1  —  A),  which  in  turn  implies  that  the  rate  of  each  message  is  (1  —  A) /n. 
The  problem  then  reduces  to  characterizing  the  optimal  A,  •  •  • ,  A-  Consider 
'All  logarithms  and  exponentiations  have  base  2  unless  explicitly  stated. 
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a  coding  scheme  that  takes  a  source  string  of  length  l  and  erases  the  last  lDn 
bits.  The  remaining  /(I  —  Dn)  bits  are  divided  into  n  disjoint  parts,  each  con¬ 
sisting  of  /(I  —  Dn)/n  bits.  Encoder  i  transmits  the  /(I  —  Dr)/n  bits  in  the  ith 
part  to  the  decoder  over  the  ith  channel,  with  erasures  in  place  of  the  remaining 
l  —  1(1  —  Dn) /n  bits.  Thus  upon  reception  of  any  k  descriptions,  the  decoder  can 
reconstruct  kl(  1  —  Dn)/n  bits  of  the  original  source  string.  Clearly,  this  scheme 
achieves  Dk  —  1  —  k(  1  —  Dn)/n  under  both  the  worst-case  and  average-case 
distortion  criteria.  Moreover,  for  any  code  that  achieves  the  rate-distortion  vec¬ 
tor  (1  —  Dn/n ,  D i, . . . ,  Dn ),  every  description  has  rate  (1  —  Dn)/n  and  therefore 
the  point-to-point  rate  distortion  function  dictates  that  any  set  of  k  message  can 
reveal  no  more  than  a  fraction  k(  1  —  Dn)/n  bits  of  the  original  source  string. 


Thus 


i  \  ^  Al  *  \  ^  ^(1 —  AO 

max  max  -  >  d[xt,xict)  >1 - 

iC:K.=kxiexl  l  ^  n 


max  E  d(Xu  XKjt)  >  1  -  — — 

K  -x=k  l  ^  ’  n 

t= i  J 

Thus  the  aforementioned  coding  scheme  achieves  the  optimal  Di, ,  Dn  under 


both  the  worst-case  and  average-case  distortion  criteria. 


We  use  the  insight  obtained  from  the  k  —  n  case  to  construct  codes  for  the 
more  complicated  case  in  which  k  <  n.  No  excess  rate  for  a  particular  set  of 
k  descriptions  requires  that  the  information  transmitted  over  those  k  channels 
be  independent.  Since  we  impose  no  excess  rate  for  every  siz e-k  subset  of  de¬ 
scriptions,  the  information  transmitted  over  any  k  channels  must  be  mutually 
independent.  The  coding  scheme  for  k  —  n  ensures  that  this  condition  is  met 
by  dividing  an  erased  version  of  the  source  string  into  n  disjoint  (and  therefore 
independent)  parts  and  transmitting  them  uncoded  over  the  n  channels.  This 
strategy  of  sending  independent  uncoded  bits  works  as  long  as  the  bits  trans- 
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mi tted  over  each  channel  are  disjoint.  In  particular,  if  Rk(Dk)  =  (1 —Dk)/k  <  1/n 
(equivalently,  Dk  >  1  —  k/n),  the  source  string  can  always  be  divided  into  n  dis¬ 
joint,  equal  parts,  each  containing  a  fraction  Rk(Dk)  of  the  total  number  of  bits. 
If  Dk  <  1  —  k/n,  however,  then  Rk(Dk)  >  1/n  and  it  is  not  possible  to  divide 
the  source  string  into  n  disjoint  parts  each  containing  a  fraction  Rk(Dk)  of  the 
total  number  of  bits,  since  each  part  must  then  contain  more  than  1/n  of  the 
total  number  of  bits.  Transmitting  uncoded  bits,  therefore,  will  be  optimal  for  a 
rate  up  to  1/n  only;  in  order  to  achieve  a  rate  larger  than  1/n,  additional  infor¬ 
mation  about  the  source  must  be  transmitted  along  with  each  description,  and 
this  information  must  be  mutually  independent  for  every  set  of  k  descriptions. 
Random  binning  schemes  can  be  designed  in  order  to  convey  independent  in¬ 
formation  about  the  source  to  the  decoder  such  that  any  k  messages  reveal  the 
source  string  to  a  specified  distortion.  Such  schemes,  however,  suffer  from  the 
"cliff  effect";  nothing  can  be  reconstructed  from  fewer  than  k  messages,  and 
once  k  messages  have  been  received,  additional  messages  provide  no  reduction 
in  distortion  at  all. 

By  using  a  hybrid  of  these  two  approaches,  i.e.,  transmission  of  uncoded 
bits  and  random  binning,  we  can  achieve  an  incremental  reduction  in  distortion 
with  each  additional  message  while  still  satisfying  the  necessary  independence 
conditions.  With  fewer  than  k  messages,  the  decoder  can  partially  reconstruct 
the  source  string  using  the  uncoded  bits  alone.  With  k  or  more  messages,  the 
decoder  can  use  the  random  binning  component  to  decode  the  source  string  to 
a  specified  distortion  Dk,  and  can  then  use  the  uncoded  bits  in  the  messages 
to  further  reduce  distortion.  The  resulting  distortion  therefore  decreases  lin¬ 
early  with  the  number  of  messages  received,  with  a  sudden  downward  jump  at 
k  when  additional  information  about  the  source  can  be  decoded  from  the  bin- 
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ning.  Figure  2.2  depicts  how  the  achievable  distortion  varies  with  the  number 
of  descriptions  received  at  the  decoder  when  Dk  =  0.  We  provide  outer  bounds 
on  the  distortion  region  which  show  that  such  a  hybrid  scheme  is  optimal  in  a 
number  of  scenarios. 

l  <► . 

o 

. • . 

. • . 

. • . 

- 1 - 1 1 - 1 — o — o — * 

1  2  3  k-1  k  k  +  1  n 

Descriptions  received 

Figure  2.2:  The  achievable  distortion  region  for  Dk  =  0.  The  achievable  dis¬ 
tortion  decreases  linearly  with  the  number  of  descriptions  received  up  to  k  —  1 
descriptions,  and  drops  abruptly  to  zero  upon  reception  of  k  or  more  descrip¬ 
tions. 

The  threshold  Dk  =  1  —  k/n  plays  an  important  role  in  our  coding  scheme.  If 
Dk  >  1  —  k/n,  then  transmission  of  independent  uncoded  bits  over  the  n  chan¬ 
nels  as  described  above  is  sufficient.  If  Dk  <  1  —  k/n,  then  in  addition  to  sending 
uncoded  bits,  we  also  send  coded  information.  For  the  worst-case  distortion 
measure,  we  describe  this  scheme  in  detail  in  Section  2.2.1,  using  MDS  codes 
to  realize  the  coding.  Achievability  for  average-case  distortion  follows  from  the 
achievability  result  for  worst-case;  however,  an  alternative  proof  is  included  in 
Appendix  A.  8  that  does  not  rely  on  MDS  arguments  by  using  random  binning 
instead.  The  optimality  results  for  the  two  distortion  criteria  are  different,  with 
our  results  for  worst-case  distortion  being  the  stronger  of  the  two. 
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2.2  The  Worst-case  Distortion  Criterion 

We  begin  by  presenting  a  zero-error  coding  scheme  based  on  systematic  MDS 
codes  that  works  for  finite  blocklengths.  The  scheme  consists  of  two  parts — 
uncoded  bits  and  an  MDS-code  component.  With  each  message,  the  encoder 
sends  uncoded  bits  along  with  an  encoded  version  of  the  source,  using  an  (n,  k) 
systematic  MDS  code  for  the  encoding.  The  decoder  outputs  the  bits  revealed 
by  the  systematic  part  of  the  MDS  code  as  the  source  reconstruction  if  less  than 
k  descriptions  are  received.  If  k  or  more  descriptions  are  received,  the  decoder 
uses  the  uncoded  bits  and  the  bits  revealed  by  the  systematic  part  of  the  MDS 
code  to  decode  the  encoded  erased  version  by  applying  an  MDS  decoding  al¬ 
gorithm.  The  following  subsection  discusses  the  achievable  distortion  region  of 
the  MDS  coding  scheme. 


2.2.1  An  Achievability  Result 


Definition  3.  Given  n,  k  <  n,  and  Dk  e  [0,1]/  define 


R  =  ( Rk(Dk ),  1  —  Rk(Dk), . . . ,  1  —  (k  —  1  )Rk(Dk),  Dk, 

Dk  RkiyDkf  •  •  •  i  Dk  ( n  kfRki^Djfjf  and 

R  =  (Rk(Dk),  1  1  -  — , Dk,  ( U~k~1)  Dk, 

V  n  n  \  n  —  k  J 

n  —  k  —  2  \  f  1  \ 

Dk,  •  •  • ,  ( - j  )  Dk,  0). 


n  —  k  J  '  \n  —  k/ 

Theorem  1.  Let  Dk  be  a  rational  number  in  the  interval  [0, 1].  For  any  n  and  k  <  n,  if 


Dk  >  1  —  |,  then  R  e  TZVworst.  If  Dk  <  1  —  then  R  e  TZV 


worst • 


Proof.  Case  I:  Dk  >  1  —  Dk  rational 


21 


32 


Since  Dk  is  rational,  there  exists  a  positive  integer  V  such  that  l'Rk(Dk )  is  a  pos¬ 
itive  integer.  Choose  a  blocklength  l  =  anV,  where  a  is  any  positive  integer. 
Observe  a  length-/  source  sequence  X1,  and  divide  X1  into  n  disjoint  parts  such 
that  each  part  contains  l  jn  =  otV  bits.  (The  division  is  the  same  regardless  of  the 
source  realization.)  Label  the  parts  X*,  i  e  Af.  Choose  lRk{Dk )  bits  from  each 
of  the  n  parts  (since  Dk  >  1  —  -,  lRk(Dk)  <  ln  and  therefore  lRk(Dk)  bits  can  be 
chosen  from  each  part).  Denote  by  Y,  the  set  of  lRk{Dk )  bits  chosen  from  X,. 
Transmit  Y,  uncoded  over  the  ith  channel. 

The  decoding  is  trivial.  If  m  descriptions,  say  (Yl5 . . . ,  Ym),  are  received,  out¬ 
put  X^  as  the  reconstruction  of  X1 ,  where  X;m  is  such  that  the  mlRk(Dk )  bits  cor¬ 
responding  to  (Yi,...,  Ym)  are  non-erased  and  the  other  (/  —  mlRk(Dk ))  bits  are 
erasures.  Since  the  reconstruction  sequence  has  /  —  mlRk(Dk )  erasures  regard¬ 
less  of  the  source  sequence,  the  worst-case  distortion  Dm  is  (/  —  mlRk{Dk))/l  = 
1  —  mRk(Dk).  When  k  descriptions  are  received,  the  worst-case  distortion  is 
1  —  kRk(Dk )  =  Dk.  Thus  R  G  1ZD  worst • 

Case  II:  Dk  <  1  —  Dk  rational 

For  this  case,  we  present  an  achievability  scheme  based  on  MDS  ( maximum  dis¬ 
tance  separable )  codes2.  Let  m  be  the  smallest  integer  such  that  2m  >  n  and 
n{i-Dk)-k 's  an  integer  (such  an  m  exists  because  Dk  is  rational).  Define  q  =  2rn, 
and  construct  a  g-ary  MDS  code  of  length  q  —  1  and  dimension  k.  By  repeatedly 
puncturing  this  (q  —  1,  k)  MDS  code,  we  obtain  a  punctured  MDS  code  of  size 
(n,  k)  [42,  p.  190].  The  punctured  coordinates  are  revealed  to  the  decoder.  Let  Gi 
be  the  generator  matrix  of  the  punctured  (n,  k)  MDS  code,  and  assume  without 

2A  (n,  k)  MDS  code  is  a  linear  code  that  satisfies  the  Singleton  bound,  i.e.,  the  Hamming 
distance  between  any  two  codewords  is  at  least  n  —  k  +  1.  Reed-Solomon  codes,  for  instance, 
are  MDS  codes. 
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loss  of  generality  that  Gi  is  systematic,  i.e.,  Gi  is  of  the  form  [Ifc|A],  where  lk  is 
the  k  x  k  identity  matrix  and  A  is  a  k  x  n  —  k  matrix  over  the  finite  field  GF(g). 
Construct  matrices  G2, . . . ,  Gn  by  shifting  the  columns  of  Gi  to  the  right,  i.e.,  G, 
is  the  matrix  formed  by  shifting  the  columns  of  Gi  by  i  —  1  places,  with  the  last 
i  —  1  columns  of  Gi  wrapping  around.  In  particular,  if  Gi  =  [Ifcjai . . .  an],  where 
ai, ... ,  an  are  the  columns  of  A,  then  G,  =  [an_i+2  . . .  an|Ifc|ai  •  •  •  an-i+ 1]- 

Encoding:  The  encoding  procedure  is  illustrated  in  Figure  2.3.  Let  X;  be  the 
observed  source  string,  of  length  l  =  bits.  Divide  X;  into  n  disjoint 

parts,  each  of  length  hits.  (The  division  is  done  the  same  way  regard¬ 

less  of  the  source  realization.)  Let  X*,  i  G  N  denote  the  last  bits  of  the  ith 
part.  Construct  an  erased  version  Xe(  by  replacing  the  last  bits  in  each  of  the 
n  parts  by  erasures.  Thus  X(  has  /(I  —  =  mnk  bits.  Each  of  the  n  parts  of  Xg 

has  mk  bits  and  can  therefore  be  treated  as  a  concatenation  of  k  binary  strings  of 
length  m,  such  that  each  of  these  binary  strings  is  the  binary  representation  of 
an  element  in  GF (q).  Thus  each  of  the  n  parts  of  X^  can  be  mapped  to  a  vector  of 
length  k  in  GF (q).  Label  these  vectors  Z j,  j  G  A f.  Let  Y j  =  ZjG j,  j  G  A f.  Thus 
the  Y j  are  length-n  vectors  in  GF (q).  Let  YJt  =  Z denote  the  ith  element  of  Y.; 
(here  fjJt  is  the  ith  column  of  G?).  Transmit  (X, .  Y'r,  :  j  e  JV)  over  the  ith  channel. 

Decoding:  Suppose  c  <  k  descriptions  are  received  at  the  decoder.  Let  M  C 
J\f  denote  the  set  of  indices  of  the  received  descriptions.  Assume  without  loss 
of  generality  that  i  e  M.  Thus  the  decoder  receives  X,  and  Y'rl  =  Z  ;/y?):  :  j  <G  Af. 
Thus  bits  are  revealed  to  the  decoder  via  Xj.  Now  for  a  fixed  i,  exactly 
k  of  the  Gj,  j  G  A f,  (in  particular,  G /—/,-+ 1 , . . . ,  G,)  will  have  their  ith  column 
in  the  systematic  part.  Thus  one  symbol  from  k  of  the  Z j,  j  G  A f,  can  be  de¬ 
coded.  By  mapping  these  decoded  symbols  to  their  binary  representations,  the 
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Figure  2.3:  The  MDS  encoding  procedure. 

decoder  can  obtain  a  partial  reconstruction  of  X;.  Let  X,  represent  the  recon¬ 
structed  source  bits  due  to  the  ith  description.  Output  (X,  :  i  e  M)  as  the 
reconstruction  of  X1.  If  m  >  k  descriptions  are  received,  then  any  k  descriptions 
reveal  k  symbols  from  each  of  the  Y j,  j  e  A f.  Also,  since  the  punctured  co¬ 
ordinates  are  known  to  the  decoder,  it  can  construct  a  longer  codeword  from 
every  partially  received  codeword  by  adding  erasures  in  place  of  the  punc¬ 
tured  coordinates.  The  longer  codewords  can  be  treated  as  codewords  from 
the  original  (q  —  1,  k)  MDS  code.  The  original  MDS  code  can  subsequently  be 
decoded  by  applying  an  erasure  decoding  algorithm  [42,  Ch.  9]  and  all  the 
Zj  vectors  can  be  recovered.  Mapping  the  Z;  vectors  to  their  binary  represen¬ 
tations  reveals  the  erased  version  Xle  of  the  original  source  string  X1 .  Output 
{(Xi, . . . ,  X,n)}  U  {Xg\(Xi, ....  Xm)}  as  the  reconstruction  of  X;. 


Analysis:  We  now  argue  that  the  above  scheme  achieves  the  rate-distortion 
vector  (Rk(Dk),  l-1-,  1-f  „  . 1-^,  Dk,  (^)Dk,  (*£?)Dk, . . . ,  (^~k)Dk,  0). 
For  any  source  string  X1,  every  description  (say  the  ith  description)  consists 
of  (Xi,Yji  :  j  G  A/”).  Xj  consists  of  lDk/(n  —  k )  bits.  Now  since  Y)j  is  an 
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element  of  GF(g),  it  can  be  represented  by  m  bits.  Thus  {Yrji  :  j  G  A/”)  is  a 
length-n  vector  in  GF {q),  and  can  be  represented  by  mn  bits.  Every  descrip¬ 
tion  therefore  consists  of  mn  +  lDkf{n  —  k)  bits.  Since  the  source  string  consists 
of  /  =  mnk{n  —  k)/ (n(l  —  Dk)  —  k )  source  symbols,  every  description  has  rate 

mn  +  lDkj (n  -  k)  1  -  Dk 


Moreover,  every  description  received  at  the  decoder  reveals  lDk/(n  —  k )  bits  via 
X„  and  exactly  one  symbol  from  k  of  the  Z j,  j  G  J\f.  Each  of  these  k  symbols  is 
an  element  of  GF (q)  and  can  be  represented  by  m  bits.  Thus  every  description 
reveals  lDk/(n  —  k)  +  ink  bits  to  the  decoder.  (We  note  that  the  bits  revealed  by 
any  two  descriptions  are  disjoint.  The  uncoded  bits  Xa  and  Xi,  are  disjoint  by 
definition  for  any  two  descriptions  a  and  b.  Now  suppose  descriptions  a  and 
b  revealed  the  same  symbol  from  some  Z;.  Then  Y]a  =  Z;yy?0  =  Zjfjp,  =  Yjb, 
which  implies  a  =  b.)  Thus  if  c  <  k  descriptions  are  received,  the  decoder  can 
reconstruct  c{lDk/ (n  —  k)  +  ink)  bits  of  the  original  source  sequence.  Thus 


Dc 


c(g|+mfc) 

l 

cDk  cn(  1  —  Dk )  —  ck 

n  —  k  n(n  —  k) 


If  c  >  k  descriptions  are  received,  say  descriptions  1 , . . . ,  in,  then 
(Xi , . . . ,  Xm )  reveal  clDk / (n  —  k)  bits.  Moreover,  the  erased  version  of  the  source 
sequence,  X*,  can  be  reconstructed  by  applying  the  MDS  erasure  decoding  al¬ 
gorithm.  The  bits  revealed  by  (X1; . . . ,  Xm)  are  disjoint  from  the  bits  revealed 
by  Xg.  The  total  number  of  bits  revealed,  therefore,  is  clDk/(n  —  k)  +  mnk.  Thus 
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cDk  _  n(l  -  Dk )  -  k 

n  —  k  n  —  k 


n  —  c 
n  —  k 


D, 


Thus  R  G  TZVworst. 


□ 


2.2.2  Optimality  Results 

In  this  section  we  present  optimality  results  for  the  MDS  coding  scheme  de¬ 
scribed  in  the  previous  subsection.  We  first  establish  some  preliminary  results 
in  Appendix  A.l  which  will  be  used  in  the  proofs  of  the  following  theorems.  The 
optimality  results  presented  here  are  stronger  than  those  for  average-case  dis¬ 
tortion  (Section  2.3.2)  and  yield  a  more  complete  characterization  of  the  achiev¬ 
able  distortion  region.  Since  we  are  dealing  with  worst-case  distortion  con¬ 
straints,  the  following  results  hold  for  any  source  distribution. 

Theorem  2.  For  any  n  and  k,  if  Dk  >  1  —  -  and  rational3,  then 

V  ( Rk(Dk ),  £>i, . . . ,  Dk, . . . ,  Dn)  G  HVworst/  Drn  >  1  -  mRk(Dk )  for  all  m  G  A f. 

Proof  Let  Dk  >  1  —  -.  If  a  code  achieves  a  certain  distortion  under  worst-case 
distortion,  then  it  will  achieve  that  distortion  under  average-case  distortion  as 
well.  The  result  therefore  follows  from  the  first  part  of  Theorem  7.  □ 

Definition  4.  Let  X(  he  a  vector  taking  values  in  X1 .  An  erased  version  ofX1  is  a 
vector  X*(X)  (where  X;(-)  is  a  function  of  the  X  string),  taking  values  in  X1,  such  that 
$  t  G  {1, . . . ,  /}  such  that  Xt(X)  =  +  and  Xt  =  —  or  Xt(X)  =  —  and  Xt  =  +. 

3For  this  theorem  and  subsequent  theorems  in  this  subsection,  we  consider  rational  values 
for  Dk  since  any  code  over  a  finite  blocklength  can  yield  rational  distortions  only. 
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The  following  lemma  is  integral  to  the  proofs  of  our  optimality  results  for 
worst-case  distortion.  Intuitively,  the  lemma  says  that  for  any  code  that  encodes 
length-/  source  sequences  into  n  pairwise  independent  messages,  there  exists  a 
source  sequence  for  which  each  of  the  /  bits  can  be  revealed  by  at  most  one  of 
the  n  messages. 


Lemma  1.  LetX[(  X),Xl2(X), . . .  ,Xln(X)  be  erased  versions  of  the  source  string  X.1  £ 
X1.  Suppose  X1  is  i.i.d.  uniform  over  X1.  If  for  all  t  £  {1, . . . ,  /},  /( Xit(X);  Xjt(X))  = 
0  V  i,j  £  M,i  f  j,  then 


max  } 
x!eV' 

1=1 


1  x 

t  ^2d(xt,Xit(x)) 

1  t= i 


>  n  —  1. 


Proof  See  Appendix  A.9. 


□ 


The  following  theorem  proves  that  the  MDS  coding  scheme  is  optimal  for  all 
n  and  k  when  a  single-message  is  received  at  the  decoder. 

Theorem  3.  For  any  n  and  k,  if  Dk  <  1  —  -  and  rational,  then 

V  (Rk(Dk),  Di, . . . ,  Dk , . . . ,  Dn)  £  TIV  worst /  D\  ^1  n . 

Proof  See  Appendix  A.2.  □ 


The  following  theorem  shows  that  the  MDS  coding  scheme  is  Pareto  optimal 
in  the  distortions  D1, . . . ,  Dk-\. 

Theorem  4.  For  any  n  and  k,  R  is  Pareto  optimal  in  Di, . . . ,  Dk_x,  i.e.,  there  does 
not  exist  (Rf  Df  . . . ,  D'n )  £  7 ZDworst  such  that  either  R'  <  R^Df),  or  R'  <  Rk(Dk), 
D\<  1  —  ^ for  all  1  <  i  <  k  —  1  and  D'-  <  1  —  3-for  at  least  one  j,  1  <  j  <  k  —  1. 

Proof  See  Appendix  A.3.  □ 
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The  following  theorem  shows  that  for  certain  values  of  m,  n  and  k,  the  MDS 
coding  scheme  is  optimal  when  m  messages  are  received. 

Theorem  5.  For  any  n  and  k,  if  m  <  \  and  m\n  (m  divides  n),  then 
V  ( Rk{Dk ),  Di, . . . ,  Dk, . . . ,  Dn)  G  VSDworst,  Drn  >  1  — 

Proof.  See  Appendix  A.4.  □ 

It  is  worth  noting  that  that  our  converse  bounds  for  Dk  <  1  —  |  are  sharper 
than  the  cooperative  or  cut-set  bound,  which  is  given  by  Dm  >  1  —  rnRkil)k). 


2.3  The  Average-case  Distortion  Criterion 

2.3.1  An  Achievability  Result 

Theorem  6.  Let  Dk  e  [0, 1].  For  any  n  and  k  <  n,  if  Dk  >  1  —  then  R  e  7 ZVavg. 
If  Dk<  1  -  then  R  e  TZVavg- 

Proof  Theorem  6  is  implied  by  Theorem  1.  However,  an  alternate,  more  conven¬ 
tional  proof  based  on  random  binning  arguments,  which  also  proves  Theorem  6 
for  the  closure  region  TZVavg,  is  included  in  Appendix  A.8.  □ 


2.3.2  Optimality  Results 

We  now  present  optimality  results  for  average-case  distortion.  These  optimal¬ 
ity  results  deal  primarily  with  single-message  optimality,  i.e.,  when  only  one 
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message  is  received  at  the  decoder,  and  are  weaker  than  the  optimality  results 
proved  earlier  for  worst-case  distortion.  Moreover,  the  optimality  results  per¬ 
tain  to  the  achievable  region  1ZVavg  itself  rather  than  its  closure  lZVavg.  In  other 
words,  there  exists  a  "closure  gap"  between  the  inner  bound  in  Theorem  6  and 
the  outer  bounds  presented  below.  It  should  be  evident  from  the  proofs  of  the 
optimality  results  in  the  previous  section  that  for  converse  proofs,  only  the  pair¬ 
wise  independence  condition  between  the  component  variables  Xit  and  X]t  is 
important,  and  this  condition  follows  from  independence  at  the  block  level.  The 
difficulty  is  that  when  we  attempt  to  prove  an  outer  bound  for  the  closure,  no 
excess  rate  imposes  a  weaker  independence  condition  on  the  transmitted  mes¬ 
sages;  messages  need  not  be  completely  mutually  independent  but  rather  nearly 
mutually  independent  (i.e.,  for  any  k  messages  , . . . ,  fk,  no  excess  rate  yields 
h(f\ ;  •  •  • ;  fk)  <  en  for  some  e  >  0,  rather  than  4(/i; . . . ;  fk)  =  0). 

A  similar  situation  for  the  simpler  case  of  two-channel  MD  with  no  excess 
rate  for  two  descriptions  was  addressed  by  Ahlswede  in  [25],  where  he  used 
"wringing  techniques"  to  prove  a  tight  outer  bound  without  a  closure  gap.  The 
wringing  technique  is  a  way  to  infer  near  independence  at  the  component  level 
given  near  independence  at  the  block  level.  By  conditioning  on  suitable  random 
variables,  the  wringing  technique  ensures,  given  two  random  vectors  that  are 
nearly  pairwise  independent,  that  they  are  also  nearly  pairwise  independent 
in  each  component.  More  precisely,  if  Xi2)  <  el  for  some  e  >  0,  then  for 
any  <5  >  0  there  exist  f1? . . . ,  tm  G  {1, . . . ,  /}  (where  m  <  el/5 )  such  that  for  all 

t  e  {i, . . . ,  /},  i (xlt- x2t\xltlx2t,, . . . , xltmx2tm)  <  5. 

It  seems  natural  to  employ  the  wringing  technique  to  remove  the  closure 
gap  in  the  optimality  results  presented  here.  However,  there  is  one  important 
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difference  between  our  MD  problem  and  the  two-description  problem  consid¬ 
ered  in  [25]  which  renders  the  wringing  technique  ineffective  in  our  case.  In  the 
two-description  case  with  no  excess  rate  for  two  descriptions,  there  is  only  one 
set  of  descriptions  (i.e.,  the  set  containing  both  descriptions)  for  which  no  excess 
rate  is  imposed,  resulting  in  a  single  pairwise  independence  condition.  In  our  71- 
description  case  with  no  excess  rate  for  every  k  descriptions,  there  are  (")  sets  of 
k  descriptions  for  which  there  is  no  excess  rate,  and  thus  there  are  (”)  indepen¬ 
dence  conditions,  one  for  each  of  the  (?)  sets.  If  one  applies  existing  wringing 
techniques  here,  then  one  would  obtain  a  set  of  conditioning  variables  for  each 
of  the  iff)  constraints.  If  these  sets  of  variables  happened  to  be  the  same  for  all 
of  the  constraints,  then  we  could  conclude  component-wise  independence  in  all 
Q)  cases,  but  there  is  no  guarantee  that  this  will  happen.  Developing  wringing 
techniques  for  this  setup  would  be  useful  future  work. 

The  following  theorem  shows  that  when  only  one  message  is  received  at  the 
decoder,  our  coding  scheme  is  optimal,  modulo  a  closure  operation,  for  all  n 
and  k  satisfying  (l  —  Recall  that,  given  Dk,  we  use  Rk(Dk)  to  denote 

(1  -  Dk)/k. 

Definition  5.  For  any  fixed  Dk/  define 

D\  =  inf{Di  :  ( Rk(Dk ),  D\, . . . ,  Dk, . . . ,  Dn )  G  lZT>avg}. 

Theorem  7.  For  any  n  and  k  <  n,  if  Dk  >  1  —  -,  then  for  any 
(• Rk(Dk ),  Di, . . . ,  Dk, . . . ,  Dn)  G  7 ZVaVg,  Dm  >  1  -  mRk(Dk )  for  all  m  G  A f.  If 
Dk  <  1  —  -,  Dk  is  rational 4,  and  (l  —  -)fc  <  \  then  Dt  >  1  —  -. 

Proof.  See  Appendix  A.5.  □ 

4For  this  theorem  and  subsequent  theorems  in  this  subsection,  we  consider  rational  values 
for  Dk  since  any  code  over  a  finite  blocklength  can  yield  only  rational  distortions. 
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We  note  that  (l  —  A)fc  <  \  implies  k  >  log(r?./ri_i)  :=  A (n).  Since  \{ri)/n  — > 

1/  log  e  as  n  — >  oo,  the  second  part  of  Theorem  7  provides  a  lower  bound  on  D\ 
for  a  large  range  of  k  when  n  is  large. 

The  following  theorem  proves  single-message  optimality  for  the  coding 
scheme  when  n  —  4  and  k  =  2.  This  case  is  not  included  in  Theorem  7. 

Theorem  8.  Let  Dk  <  1  —  -  and  rational.  If  n  —  4  and  k  =  2,  then  Dt  >  1  — 

Proof.  See  Appendix  A.6.  □ 

Theorem  7  handles  the  regime  in  which  k  is  large.  We  now  study  the  other 
extreme,  i.e.,  when  k  is  small.  In  particular,  we  look  at  the  k  =  2  case.  The  follow¬ 
ing  theorem  provides  a  lower  bound  on  the  optimal  single-message  distortion 
for  n  >  3  and  k  =  2.  This  lower  bound  differs  from  the  distortion  achieved  by 
our  coding  scheme  by  exactly  1  /n,  and  thus  becomes  progressively  tighter  as  n 
increases. 

Theorem  9.  Let  Dk  <  1  —  -  and  rational.  Ifk  =  2,  then  for  n  >  3,  >  1  —  A 

Proof.  See  Appendix  A .7.  □ 

We  conjecture  that  the  lower  bound  in  Theorem  9  is  not  tight  and  that  our 
scheme  is  in  fact  optimal.  Evidence  for  this  is  provided  by  Theorem  8. 


2.4  A  General  Multiple  Descriptions  Architecture 

The  scheme  described  above  provides  a  substrate  that  can  be  used  to  construct 
no-excess-rate  multiple  descriptions  codes  for  a  general  source  using  only  a 
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point-to-point  rate-distortion  code  for  that  source.  We  illustrate  this  idea  for 
a  Gaussian  source,  where  the  resulting  scheme  is  optimal  in  a  certain  sense. 
The  extension  to  arbitrary  sources  should  be  clear  from  the  proof.  Suppose  that 
(Xt)£i  is  a  memoryless  Gaussian  process,  where  Xt  is  a  vector  of  length  N  and 
has  a  marginal  distribution  W( 0,  Kx).  The  distortion  for  a  source-reconstruction 
pair  ( X l,  x!)  is  measured  as  E  ^=i(Xt  —  Xf)(Xf  —  Xt)T 1  .  We  compare  distor¬ 
tions  in  the  positive  definite  sense,  i.e.,  D A  ^  DB  iff  —  DB  ^  0. 


Definition  6.  The  rate-distortion  vector  (R.  D| .....  D„)  is  achievable  if  for  some 
l  there  exist  encoders  f-1'1  :  RNxl  — >  {1, . .  . ,  M-l}},  i  e  Af  and  decoders  : 
•  •  • ,  — y  RNxl,  JC  c  A/”,  JC  f  0,  such  that 


R  >  -  log  Mp  V  i,  and 


y^(Xt-X^)(Xt-X^)T  V  /C  C  A/”,  |/C|  =  k, 


whereXK  =  E[X}\ff\xl),i  e  JC}. 


We  use  TTD gauss  to  denote  the  set  of  achievable  rate-distortion  vectors  and 
TTD gaUss  to  denote  its  closure.  We  consider  symmetric  descriptions,  i.e.,  each 
description  has  the  same  rate  Rg  and  the  distortion  constraint  depends  only  on 
the  number  of  descriptions  received.  We  consider  the  case  where  there  is  no 
excess  rate  for  every  k  out  of  n  descriptions,  i.e.,  kRg  =  R(Dk ),  where  R(-)  is  the 
Shannon  rate-distortion  function  and 

R{ Dfc)  =  min  \  log 

d  2  |D| 

s.t.  D  Dk  and 

D  Kx. 

Thus  Rg  =  ^R( Dfc)  bits/symbol. 
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Definition  7. 


Rg  = 


/  „  Dfc  +  —  1)FU  2D^.  +  (n  —  2)FU 

V  9’  n  ’  n 

(k  -  l)Dfc  +  (n  -  k  +  1)KX 


*  5 


Theorem  10.  RG  G  1ZVgauss. 


Proof.  It  suffices  to  show  that  for  any  e  >  0,  the  rate-distortion  vector 

Dfc  +  el  +  (n  —  1)KX 


Rg+c  —  Rg  +  e, 


n 


( k  —  l)(Dfc  +  el)  +  (n  —  k  +  l)Ka; 


n 


Dfc  +  el, . . . ,  Dfc  +  el 


is  achievable.  For  any  e  >  0,  we  know  from  rate-distortion  theory  that  there 
exist  integers  l  and  V ,  with  V  <  l(R( Dfc)  +  e),  such  that  any  source  sequence 
X1  of  l  symbols  can  be  compressed  to  a  sequence  Y1'  consisting  of  l '  bits  and 
then  reproduced  from  Y/;  with  distortion  D/  +  el.  Fix  e  and  choose  a  block- 
length  nl.  Using  the  aforementioned  rate-distortion  code,  we  can  compress  the 
length -nl  source  sequence  (consisting  of  n  blocks,  each  of  length  l )  into  a  bi¬ 
nary  sequence  Ynl>  taking  values  in  X .  Now  YnV  can  be  treated  as  n  blocks  of 
length  V  each,  and  can  be  transmitted  to  the  decoder  over  the  n  channels  using 
the  MDS-coding  based  scheme  proposed  in  Section  2.2.1.  Thus  every  descrip¬ 
tion  contains  V  uncoded  bits  (i.e.,  one  of  the  n  blocks)  of  YnV .  In  particular,  the 
decoder  should  be  able  to  completely  reconstruct  Ynl'  upon  reception  of  any 
k  descriptions,  i.e.,  there  is  no  distortion  for  every  k  out  of  n  descriptions  (this 
corresponds  to  a  special  case  of  Theorem  1  with  Dk  =  0).  Thus  every  set  of  k 
descriptions  must  reveal  nV  bits,  and  therefore  the  rate  of  a  single  description 
is  R  —  nV /knl  =  V  jkl  bits  per  symbol  of  X1.  Moreover,  since  every  description 
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contains  l'  uncoded  bits,  the  decoder  can  reconstruct  ml'  bits  (i.e.,  m  blocks)  of 
Ynl'  upon  reception  of  any  m  <  k  descriptions. 

We  now  argue  that  RG+e  is  achievable.  The  rate  of  every  description  is  II  = 
I'/kl  <  (R( Dfc)  +  e)/k  <  Rg  +  e.  Moreover,  any  m  <  k  descriptions  reveal  ml' 
bits  ( m  blocks)  of  Ynl'  completely,  and  reveal  nothing  about  the  other  n  —  m 
blocks.  Thus  the  decoder  can  reconstruct  a  fraction  m/n  of  Xnl  (i  .e.,  m  out  of  the 
n  blocks  of  Xnl)  from  the  m  blocks  of  Ynl'  revealed  to  it  with  distortion  at  most 
D/  +  el,  and  must  reconstruct  the  remaining  fraction  without  any  information 
(incurring  distortion  K,:).  If  we  take  the  time  average  over  all  blocks,  we  can  see 
that  the  decoder  can  reconstruct  Xnl  with  distortion  at  most  WDfc+ep+fn-m)!^  _ 

n 

When  k  or  more  descriptions  are  received,  the  decoder  is  able  to  reconstruct 
YnV  completely  and  can  reconstruct  Xnl  with  distortion  at  most  =<:  D/  +  el.  □ 


Next,  we  show  that,  for  the  special  case  of  symmetric  scalar  Gaussian  mul¬ 
tiple  descriptions  with  two  levels  of  receivers  (where  one  receiver  reconstructs 
the  source  from  any  k  out  of  n  descriptions  with  distortion  Dfc  and  the  second 
receiver  reconstruct  the  source  from  all  n  description  with  distortion  D„),  and 
no  excess  rate  for  the  second  receiver,  the  aforementioned  scheme  achieves  the 
optimal  D/  .  It  has  been  shown  by  Wang  and  Viswanath  [43,  Theorem  1]  that 
given  distortion  constraints  D&  and  Dn,  the  symmetric  multiple  description  rate 
for  an  i.i.d.  vector  Gaussian  source  with  mean  0  and  covariance  K  ,  is 


R  =  sup  -  log 
k^o  2 


IK, 


I K  E  T  K,  I  kn  I  D, 


K, 


|D„.U  |Di 


KJ* 


Thus  the  sum  rate  of  the  n  descriptions  is 


nR  =  sup  -  log 

K,  ^0  2 


|KX| 

|K.r  +  IC| 

l  n  —  k  | 
k 

D 

]n  +  K  z 

|Dn| 

D  k 

+  K, 

1  n 
k 

(2.1) 
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Theorem  11.  For  scalar  Gaussian  multiple  descriptions  (i.i.d.  J\f( 0,  of)  Gaussian 
source)  with  tzvo  levels  of  receivers  (distortion  constraints  Dk  and  Dn,  respectively) 
and  no  excess  rate  for  the  second  receiver,  Dk  >  -Dn  +  riffiof 


Proof  Assume  WLOG  that  cr2  =  1.  Reducing  (2.1)  to  the  scalar  case  and  using 
the  no  excess  rate  condition  gives 


1 

2 


log 


sup 

A>0 


1  /j_  (l  +  (Dn  +  \)\ 

2  °g\Dn'  (Dk  +  A)t  ) 


which  implies 


ft  1,  (l  +  X)^(Dn  +  X) 

0  =  SUP  -  log  - —  „ 

a>o  2  \  (Dk  +  A)  k 


n 

Define  /(A)  =  {1+x)k  (D?+X) .  Then 

v  ’  ( Dk+\)H 


0 


sup  loge/(A) 

A>0 


sup  -  l)  loge(l  +  A)  +  loge(Dn  +  A)  -  ^  log e(Dk  +  A) 

,  Dn  +  A  n  1  +  A 

sup  loge  +  j  loge 

a>o  1  +  A  k  Dk  +  A 


sup  loge 

A>0 


n , 

+  -k  log. 


1  —  Dk  A 
Dk  +  A  ) 


Define 


9{X) 


Using  the  fact  that 


2(1 


/  Dn  1  \  2 
V  1+A  )_ 

_  I  A.-l|\2 
I  1+A  1/ 


+ 


2(1 


(  1  -Dk  12 
{Dk+ 

_  I  1  -Dk  02  ' 
‘  Ufe+A  I  / 


loge(l  +  x)  >  X 


x 2 

2(1  -  \x\)2 


for  hi  <  1 


we  obtain 


0  >  sup 

A>0 


f  Dn  -  .I  n 
v  1  +  A  +  k 
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1  ~Dn 
1  +  A 
Dk  +  A 
1  +  A 


> 


n 

k 


>  n  f  1  -  Dk\ 
~  k  \1  -  Dn) 


Dk  +  A 

1  -  Dn 


aW- 


Now  let  A  — y  oo.  Then  f^xg( A)  — >  0  and  — >  1.  We  thus  have 


1  > 

Dk  > 


n  f  1  -  Dk\ 
k\  1  -  Dn) 


— 

n  n 


□ 
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CHAPTER  3 

OPTIMAL  DELAY-RECONSTRUCTION  TRADEOFFS  IN  PEER-TO-PEER 

NETWORKS 

3.1  Problem  Formulation  and  Coding  Scheme 

We  begin  with  the  formulation  of  the  binary  erasure  robust  CEO  problem,  de¬ 
picted  in  Figure  3.1.  Let  A f  =  (1, . . .  ,  n}  and  X  =  {+,  — Let  X  be  a  uni¬ 
form  binary  random  variable  taking  values  in  X.  We  assume  that  this  source 
is  i.i.d.  over  time,  and  we  denote  a  length-/  sequence  of  X  by  X1.  Define 
Yi  =  Nr  ■  X,  i  e  J\f,  where  Ni, ...  ,Nn  are  independent  Bernoulli  random  vari¬ 
ables  with  0  <  Pr (Ni  —  0)  =  Pi  <  1.  Thus  each  Yt  is  the  output  of  passing 
X  through  a  binary  erasure  channel  (Figure  3.2)  with  erasure  probability  pi, 
and  takes  values  in  X  =  ,  0},  where  0  denotes  the  erasure  symbol.  There 

are  n  encoders,  each  of  which  is  a  function  ft  :  X1  — >■  jl, . . . ,  j  ,  i  e  Af. 
Encoder  /,,  i  e  Af,  observes  Y-  and  transmits  an  encoded  version  of  it  over 
channel  i.  The  decoder  either  receives  this  description  without  error  or  does  not 
receive  it  at  all.  Excluding  the  case  in  which  none  of  the  messages  is  received, 
the  receiver  may  receive  2n  —  1  different  combinations  of  messages.  Thus  it  can 
be  represented  by  2n  —  1  decoding  functions  <?£,  /C  C  J\f,  K  ^  0  of  the  form 
gjc  :  rifcev  jb  •  •  • ,  |  — >  X1.  Based  on  the  set  of  received  messages  /C,  the  re¬ 

ceiver  employs  the  corresponding  decoding  function  to  output  a  reconstruction 
XlK  of  the  original  source  string  X1. 

We  measure  the  fidelity  of  the  reconstruction  using  a  family  of  distortion 
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Figure  3.1:  The  binary  erasure  robust  CEO  problem 

X  Y 

1-P 


Figure  3.2:  A  binary  erasure  channel  (BEC)  with  erasure  probability  p 


measures,  {c?a}a>0/  where 


dx(x,x )  =  < 


0  if  x  =  x 
1  if  x  —  0 
A  otherwise. 


We  are  particularly  interested  in  the  large-A  limit,  wherein  erasures  incur  unit 
cost  while  errors  are  penalized  highly.  In  this  regime,  dx  approximates  the  era¬ 
sure  distortion  measure  [33,  p.  338]. 


In  general,  one  could  impose  a  distortion  constraint  for  every  subset  of  re¬ 
ceived  messages.  This  generality  is  not  needed  here,  however,  so  we  will  only 
measure  the  distortion  as  a  function  of  the  number  of  received  messages. 

Definition  8.  (Ri,  f?2,  •  •  • ,  Rn,  Di,  D2, . . . ,  Dn)  is  an  achievable  rate-distortion  vec¬ 
tor  if  there  exists  a  block  length  l  for  which  there  exist  encoders  /,,  i  e  J\f,  and  decoders 
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gic,  /C  C  AT,  K  f  0  such  that 


Ri  >  y  log  M\ 


(0 


Du  >  E 


J2d\Xt,XKt) 


t=  i 


/or  all  /C  si.  |/C|  =  A:. 


(3-1) 


Lef  TZT>ceo(X)  denote  the  set  of  achievable  rate-distortion  vectors.  Define 


RRceo  —  P)  TZVceo{X). 


A>1 


It  is  worth  noting  that 

Dk  >  max  TT pi, 

2G/C 

since  when  all  of  the  corresponding  K;  are  erased  for  a  given  subset  of  mes¬ 
sages,  the  decoder  gets  no  information  about  X  whatsoever  and  is  forced  to 
output  erasures  instead.  We  use  7 ZVCeo  to  denote  the  closure  of  7 ZVCeo-  In 
a  P2P  context,  the  encoders  represent  peers  in  the  network  that  have  access  to 
partial  copies  Yt  of  the  received  file  X.  Peers  generate  encoded  packets  in  a  de¬ 
centralized  fashion,  without  communicating  with  other  peers,  based  on  their 
own  partial  knowledge  of  the  file.  The  erasure  distortion  measure  measures 
how  much  of  the  file  is  reconstructed  from  these  encoded  messages. 

A  natural  achievability  scheme  for  this  setup  is  vector  quantization  using 
erasure  test  channels  followed  by  Slepian-Wolf  binning  at  each  encoder.  Since 
this  is  a  particularization  of  a  scheme  in  [47],  we  provide  only  a  high-level  de¬ 
scription  and  refer  the  reader  to  [47]  for  a  detailed  treatment.  For  a  fixed  block- 
length  l,  Encoder  i,  i  e  J\f  first  performs  vector  quantization  of  the  possible  Y- 
sequences  using  an  erasure  test  channel  (Figure  3.3). 

Specifically,  Encoder  i  chooses  a  parameter  q,  for  the  erasure  test  channel  and 
generates  codewords  i.i.d.  according  to  the  output  distribution  of  this  channel 
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Y,  Ve(q) 


1-q 


Figure  3.3:  Erasure  test  channels 

when  the  input  is  Yt.  These  codewords  are  then  divided  randomly  amongst  2nRi 
bins.  Then  given  the  Yj  sequence,  the  encoder  searches  for  a  codeword  with 
which  it  is  typical,  and  transmits  the  index  of  the  bin  containing  this  codeword. 
The  decoder  receives  the  bin  indices  transmitted  by  a  subset  of  the  encoders  and, 
if  possible,  uses  typicality  considerations  to  identify  the  codewords  within  the 
bins  that  were  selected  by  those  encoders.  In  particular,  the  decoder  searches 
for  codewords  that  are  typical  with  respect  to  the  output  distributions  of  the 
encoders'  test  channels.  These  codewords  will  collectively  reveal  some  of  the 
source  bits  X1  but  not  others,  and  the  decoder  creates  a  reconstruction  X1  of  the 
file  that  specifies  the  known  bits  while  leaving  the  remaining  ones  erased. 

The  aforementioned  scheme  exhibits  a  fundamental  tradeoff  between  inter¬ 
mediate  performance  (i.e.,  the  fraction  of  the  file  that  can  be  reconstructed  when 
only  a  subset  of  the  messages  is  received)  and  the  overall  efficiency  of  the  file 
transfer  (i.e.,  the  fraction  of  the  file  that  can  be  reconstructed  when  all  n  mes¬ 
sages  are  received).  Although  the  scheme  is  valid  for  the  case  where  p,  and  R, 
are  different  for  different  encoders,  and  we  have  stated  it  in  its  most  general 
form,  important  insight  can  be  gained  into  the  above  tradeoff  if  we  consider  the 
special  case  in  which  the  encoders  are  symmetric.  We  therefore  consider  the 
scenario  in  which  all  of  the  Y,  have  the  same  erasure  probability,  p,  =  p,  the 
rates  are  identical,  R,  =  R,  and  all  of  the  encoders  use  the  same  test  channel 
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parameter,  i.e.,  q^  =  q  for  all  i. 

Let  us  first  understand  the  performance  of  the  scheme  in  the  symmetric  case. 
Consider  the  portion  of  the  string  X1  that  the  decoder  will  be  able  to  reconstruct 
as  a  function  of  the  number  of  messages  received.  For  the  first  few  messages, 
the  decoder  will  be  unable  to  recover  the  codewords  chosen  by  the  encoders. 
As  such,  it  will  be  unable  to  reproduce  any  of  the  bits  of  X1,  and  accordingly  its 
reconstruction  will  be  entirely  erasures.  After  sufficiently  many  messages,  say 
k,  have  been  received,  the  decoder  will  be  able  to  determine  all  k  codewords 
from  the  bin  indices  and  thereby  determine  some  of  the  source  bits.  More  pre¬ 
cisely,  the  decoder  will  have  access  to  k  codewords,  each  of  which  is  a  copy 
of  the  source  string  with  a  fraction  p  +  (1  —  p)q  of  the  bits  erased.  Since  the 
erasures  in  different  codewords  are  independent,  the  fraction  of  erasures  in  the 
reconstruction  will  be 

Dk  =  (p+(l~p)q)k 

which  by  our  choice  of  distortion  measure  is  also  the  distortion.  If  additional 
messages  are  then  received,  their  associated  codewords  can  be  determined 
through  typicality  considerations.  These  additional  codewords  will  allow  the 
decoder  to  reproduce  even  more  of  the  bits  of  the  source.  In  fact,  the  fraction  of 
erasures  in  the  reconstruction  will  be 

Dm  =  (p  +  (1  -  p)q)m 

where  m  is  the  number  of  received  messages.  In  particular,  we  have  Dm  =  D'^k. 

The  relation  Dm  =  D™^k  captures  the  tradeoff  between  the  fraction  of  the  file 
that  can  be  reconstructed  from  m  <  n  messages  (intermediate  performance)  and 
the  overall  efficiency  of  the  file  transfer  {i.e.,  the  fraction  of  the  file  that  can  be  re¬ 
constructed  from  all  n  messages,  which  in  this  case  is  1  —  Dn  =  1  —  Dr{/k).  Notice 
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that  the  above  scheme  enables  us  to  operate  between  two  extremes  in  P2P  tech¬ 
nology  which  exhibit  the  same  aforementioned  tradeoff;  (i)  peers  share  packets 
without  coding  ( e.g .,  BitTorrent),  and  (ii),  peers  share  fully  encoded  packets  (e.g., 
network  coding).  Letting  k  —  1  allows  us  to  recover  the  "no  coding"  case;  there 
is  no  binning,  and  every  message  reveals  a  partial  source  string  to  the  decoder. 
Letting  k  =  n  allows  us  to  recover  the  "coding"  case;  every  quantized  codeword 
is  binned,  and  the  decoder  can  only  recover  the  codewords  when  all  n  messages 
have  been  received. 


Figure  3.4:  Performance  of  the  achievability  scheme  for  n  =  10,  p  =  0.1,  and 
encoder  rate  R  =  0.25.  The  solid  curve  corresponds  to  k  —  1  (no  coding),  the 
dotted  curve  to  k  =  10  (coding),  and  the  dashed  curve  to  k  =  5. 

By  varying  k,  therefore,  we  can  interpolate  between  the  "coding"  and  "no 
coding"  extremes.  Figure  3.4  illustrates  the  performance  of  the  scheme  for 
n  =  10  and  p  =  0.1.  The  solid  curve  corresponds  to  k  —  1  (no  coding),  the 
dotted  curve  to  k  =  10  (coding),  and  the  dashed  curve  to  k  =  5.  An  encoder  rate 
R  =  0.25  was  used  for  all  three  cases.  Notice  that  the  "no  coding"  case  yields 
good  intermediate  performance;  20%  of  the  file  can  be  reconstructed  from  a  sin¬ 
gle  message,  and  the  distortion  falls  to  0.8.  The  overall  efficiency,  though,  is 
not  good;  about  15%  of  the  file  cannot  be  reconstructed  even  when  all  of  the 
messages  have  been  received.  The  "coding"  case  performs  contrary  to  the  "no 
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coding"  case:  nothing  at  all  can  be  reconstructed  with  fewer  than  n  messages 
(Drn  =  1  for  m  <  n),  but  once  n  messages  have  been  received,  everything  can  be 
reconstructed  (the  distortion  is  almost  0).  The  k  —  5  curve,  however,  illustrates 
how  the  aforementioned  scheme  allows  partial  reconstruction  of  the  source  with 
fewer  than  n  messages  as  opposed  to  the  "coding  case"  (the  decoder  can  out¬ 
put  a  partial  reconstruction  as  long  as  k  messages  have  been  received),  and  also 
achieve  a  better  overall  efficiency  with  n  messages  than  the  "no  coding"  case  (in 
fact,  with  k  =  5,  almost  all  of  the  file  can  be  reconstructed  from  n  messages). 

The  ability  to  partially  reconstruct  the  source  can  prove  vital  in  the  context 
of  distribution  of  content,  e.g.,  video  files,  in  P2P  networks.  In  such  a  scenario, 
our  coding  strategy  can  be  implemented  on  the  level  of  video  frames  rather 
than  bits,  treating  the  entire  video  file  as  a  coding  block.  In  this  case,  users 
with  a  partial  reconstruction  of  the  video  file  can  watch  the  whole  video  by 
interpolating  over  the  missing  frames.  This  would  lead  to  lower  buffering  delay 
(in  Figure  3.4,  for  instance,  the  delay  is  halved  for  k  =  5  as  compared  to  the 
"coding"  case)  and  might  at  the  same  time  yield  adequate  playback  quality, 
depending  on  the  purposes  of  the  user.  As  more  messages  are  received,  users 
would  be  able  to  reconstruct  a  higher  quality  video.  Partial  reconstruction  also 
provides  other  advantages  in  this  context;  peers  with  partially  reconstructed 
files  can  transmit  uncoded  bits  to  peers  that  are  still  waiting  to  receive  enough 
messages  to  start  decoding.  This  would  lead  to  smaller  user-perceived  delays 
than  with  network  coding,  without  compromising  the  overall  efficiency  of  the 
download.  Moreover,  if  users  accidentally  downloaded  the  wrong  file,  they 
would  be  able  to  stop  the  download  after  viewing  the  partial  file. 

In  the  next  section,  we  prove  optimality  results  for  the  delay-reconstruction 
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tradeoff  exhibited  by  the  aforementioned  coding  scheme  with  symmetric  en¬ 
coders.  In  order  to  achieve  distortion  Dk  with  k  messages  and  fixed  encoder 
rate  R,  we  must  have  kR  >  R(Dk),  where  R(-)  is  the  Shannon  rate-distortion 
function  for  the  robust  CEO  problem.  In  practice,  having  kR  strictly  greater 
than  R{Dk)  is  wasteful,  since  the  additional  rate  can  be  used  to  convey  useful 
information  about  the  source  and  lower  the  distortion  below  Dk.  We  therefore 
focus  on  the  case  when  kR  =  R(Dk),  which  implies  that  the  encoder  rate  is  just 
sufficient  to  achieve  distortion  Dk  with  any  k  messages.  This  scenario  is  referred 
to  as  no  excess  rate  in  information  theory. 


3.2  Pareto  Optimality  of  the  Scheme  in  the  Symmetric  Case 

We  now  show  that  for  symmetric  encoders,  given  k  and  Dk/  the  tradeoff  Drn  = 
Dk  ^k  between  the  distortion  and  the  number  of  received  messages  is  Pareto 
optimal.  In  particular,  we  will  show  that  any  scheme  that  achieves  distortion 
Dk  for  k  messages  must  have  Dm  >  Dn^k .  It  is  known  from  the  results  in  [44] 
that  the  minimum  per-encoder  rate  required  to  achieve  a  given  distortion  Dk 
when  any  k  messages  are  received  is 

R=]—^+g(Dl/k)  (3.2) 

where1  g(-)  is  given  by 

I  h{x)  -  (1  -  p)h(jE^)  P<x<  1 

g(x)  =  { 

I  0  x  >  1. 

'All  logarithms  and  exponentiations  in  [44]  have  base  e  whereas  we  use  base  2  here.  There¬ 
fore  the  corresponding  expression  in  [44]  is  R  =  k  (1  -  Dk)  log  2  +  g(D£ ). 
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By  choosing  the  erasure  test  channel  parameter  q  accordingly,  the  scheme  de¬ 
scribed  above  can  achieve  equality  in  (3.2).  We  next  show  that  with  this  choice 
of  q,  the  scheme  is  Pareto  optimal  with  respect  to  (Dk,  Dk+1, . . . ,  Dn):  any  scheme 
(with  the  same  rate  R )  that  achieves  a  strictly  lower  Dm  for  some  k  <  m  <  n 
must  achieve  a  strictly  larger  Dm  for  some  k  <m  <  n. 

Theorem  12.  If  (R, . . . ,  R,  £>i, . . . ,  Dn)  G  7 ZVCeo,  and 

Dk  =  inf  j-D  :  (R, . . . ,  R,  1, .  „  ,  1,  D,  1, . . . ,  1)  G  RVceo  | , 

fc-i 

i.e.,  R  is  as  given  by  (3.2),  then  Drn  >  (Z7fc)^  for  all  m  >  k. 

Note  that  this  result  makes  no  optimality  claims  about  the  performance  of 
the  scheme  when  fewer  than  k  messages  are  received.  Under  this  scheme,  the 
decoder  will  be  unable  to  recover  the  transmitted  codewords  in  this  regime,  so 
it  will  be  forced  to  declare  an  erasure  for  every  bit  in  its  reconstruction.  It  would 
be  interesting  to  determine  if  the  performance  in  this  regime  can  be  improved, 
perhaps  by  using  the  ideas  in  [37], 

In  order  to  prove  this  theorem,  we  first  establish  a  new  outer  bound  for  a 
general  problem  in  distributed  rate-distortion. 


3.2.1  Outer  Bound  on  the  Rate  Region  of  the  Multi-terminal 
Source  Coding  Problem 

Consider  the  general  problem  in  which  we  have  an  arbitrary  number  of  dis¬ 
crete  memoryless  sources  Yi , . . . ,  Yn,  with  Yt  taking  values  in  the  set  yif  en¬ 
coders  f,  i  G  JV,  a  hidden  source  Y0  that  is  not  directly  observed  by  any  en- 
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coder  or  the  decoder,  and  a  side  information  source  Yn+1/  taking  values  in  the 
set  34+i,  which  is  observed  by  the  decoder  but  not  by  any  encoder.  In  particular, 
{Y0jt,  Y\j-  •  •  • ,  Yn  t.  y;+i,/,}4l  is  a  vector-valued,  finite-alphabet  and  memoryless 
source.  Although  we  consider  finite-alphabet  sources  here,  the  outer  bound  is 
extensible  to  continuous  or  countable  alphabets,  e.g.,  Gaussian  sources,  by  using 
the  approach  in  [44],  Encoder  /,  observes  a  length-/  sequence  of  Yt  and  transmits 
a  message  to  the  decoder  based  on  the  mapping 

if 

The  decoder  seeks  to  reconstruct  the  sources,  or  functions  of  the  sources,  from 
subsets  of  messages  fc  =  {fj'j  k  G  /C},  where  K  C  J\f,  1C  ^  0.  Since  we  al¬ 
low  the  reconstruction  of  functions  of  the  sources  instead  of,  or  in  addition  to, 
the  sources  themselves,  we  represent  the  reconstructed  sequences  by  Vj, . . . ,  Vj 
(with  Vjf,  t  G  {1, _ ,  /},  j  =  1, . . . ,  J,  taking  values  in  the  set  Vj).  Given  a  sub¬ 

set  of  messages  /C  C  4,/C  /  0  and  j  G  {1, . . . ,  J},  the  decoder  thus  uses  the 
mappings 

(4A  Ai  * 

keK. 

We  have  J  distortion  measures 

n+ 1 

dj  '■  J!  Yi  x  Vj  — t  M+, 

i=0 

one  for  each  constraint. 

For  every  j  e  {1, . . . ,  J},  we  impose  a  common  distortion  constraint  for 
all  siz e-k  subset  of  messages  used  to  reconstruct  Vj.  More  precisely,  for  every 
j  G  {1, . . . ,  J},  all  (”)  subsets  of  messages  of  size  k,  when  used  to  reconstruct 
Vj,  must  satisfy  a  single  distortion  constraint.  Thus  there  are  nJ  distortion  con¬ 
straints  in  total.  Let  YK  denote  (T/Jfcev,  and  >4  denote  Tpy.  Moreover,  Y',,n-h 
denotes  {Yha,  Yii0+i, . . . ,  Yi)b}. 
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Definition  9.  The  rate-distortion  vector  (R  D)  = 

(Rl,  ■  ■  ■  ,  Rn,  D  1;1,  D 2,1,  .  .  .  ,  Dn  \ ,  D\  2i  ■  ■  ■  ,DU)2,  ■  ■  ■  , 

D i,j,  •  •  • ,  Dn<j) 

is  achievable  if  for  some  l  there  exist  encoders  f-l\  i  <G  J\f,  and  decoders  (gf)1,  /C  C 
TV,  K  f  0,  j  =  1, . . . ,  J,  such  that 

Ri  >  y  log  M-1'* ,  anrf 

D/,:  7-  >  max  E 
/C:|/C|=fe 

As  in  [44],  we  use  7£Z>*  to  denote  the  set  of  achievable  rate-distortion  vectors 
and  1ZV,  to  denote  its  closure.  We  use  the  following  definitions  from  [44], 

Definition  10.  Let  Y0,  Yx , . . . ,  Yn+1  be  generic  random  variables  with  the  distribution 
of  the  source  at  a  single  time.  Let  T0  denote  the  set  of  finite-alphabet  random  variables 
'y=(Ui,...,Un,V1,...,Vj,  W,  T )  satisfying 

(i)  (W,  T )  is  independent  of  (Y0,  Y at,  Yn+ 1), 

(ii)  Ui  (Yi,W,T)  o  (Y0,  Yjc,  Yn+1,  Ujc),  shorthand  for  “ Uir  (' Yi:W,T )  and 
(Y0)  Yjc,  Yn+i,  U ic)  form  a  Markov  chain  in  this  order",  for  all  i  e  Af,  and 

(Hi)  (Y0,  Ya7,  IE)  ++  (Uv,  Yn+1,  T)  ^  (W, . . . ,  V,). 

Definition  11.  Let  w  denote  the  set  of  finite-alphabet  random  variables  Z  with  the 
property  that  Yi,...,Yn  are  conditionally  independent  given  (Z,  Yn+1). 

There  are  many  ways  of  coupling  a  given  Z  e  f  and  7  e  T0  to  the  source. 
We  shall  only  consider  the  Markov  coupling  for  which  Z  (Y0,  Y \r ,  Yn+i)  O  7. 
We  now  state  our  outer  bound. 


^  ^  djfYo,ti  Y/c,tj  Vj  j 


t=i 
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Definition  12.  Let  7TD0(Z,  7)  =  |(R:D)  : 

^Ri  >  max  (l(Z:  U«|y„+i,  T),  I(Z;  Uk|U/c°,  Fn+i,  T)  j 

i&K 

+  ^  I(Yt]  Ui\Z ,  Yn+ 1,  W,  T)  V/C  C  TV,  and 

i£K. 

Dkj  >  max  E[^(y0,  Y/c,  Yn+i,  Vf)],  j  =  1, . . . ,  J 

K:\IC\=k 

Then  define 

nv0  =  n  u  nv0(z,^). 

ze-i/'iero 

Theorem  13.  TAD,,  zs  an  outer  bound  on  the  rate-distortion  region  for  the  general  prob¬ 
lem,  i.e.,  1ZT>*  C  7£Z?0. 

Proof.  See  Appendix  B.l.  □ 

The  new  bound  is  more  general  than  the  bound  in  [44].  Even  if  we  apply  it  to 
the  setup  of  [44],  however,  the  new  bound  offers  an  improvement.  Specifically, 
whereas  the  bound  in  [44]  lower  bounds  the  sum  rate  of  a  subset  /C  of  mes¬ 
sages  by  I(Z;  UacIU^c,  Yn+ 1,  T),  the  new  bound  improves  upon  it  by  taking  the 
maximum  of  I(Z\  Uyc|U icc,Yn+i,  T)  and  /(Z;  Uye| Yn+1,  T).  This  improvement  is 
useful  for  establishing  the  main  result.  Notice  that  if  we  have  the  Markov  chain 
Ui  O  (Yi,T)  O  (Uic),  then/(Z;Uyc|UiCc,yn+1,T)  <  /(Z;  Ujc|yn+i,T).  Since,  in 
our  setup,  a  weaker  Markov  chain  condition  (Definition  10,  (ii))  is  being  im¬ 
posed,  the  above  inequality  might  not  hold  here.  However,  as  we  show  in  the 
proof  of  Theorem  12,  using  I(Z;lJjc\Yn+i,T)  instead  of  J(Z;  U^lU^c,  Yn+1,T) 
yields  a  tight  lower  bound,  which  suggests  that  the  outer  bound  in  [44]  could 
be  loose  for  our  setup. 
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3.2.2  Proof  of  Theorem  12 


We  begin  with  the  following  lemma. 

Lemma  2.  Suppose  pm  <  D  and  that  (U,  X ,  Xjc,  Y,  W,  T)  for  all  /C,  |/C|  =  m  is  such 
that 


(i)  (X,  Y  U^c,  W)  o  (U^c,  T )  4*  Xjc, 

(ii)  Ui  o  (Yj,  W,  T)  (A",  Yic.Uic)  for  all  i  e  Af,  and 

(Hi)  i  E,ec 1  (Y«  «.l  V  »'•  T )  <  j(£1/m). 


Lei  D  -  max/c:K=m  (A/  A/c)]-  for  <5  €  (0, 1/2],  (/ 


A  >  max 


A 


32  m 


2m 


D 

1 


\Sp(l-p)/ 

then  D  >  D  —  £(Z),  5)  for  some  continuous  £  >  0  satisfying  fyD,  0)  =  0. 


Proof  See  Appendix  B.2. 


□ 


Proof  of  Theorem  12.  It  suffices  to  prove  Theorem  12  for  a  single  subset  of  mes¬ 
sages  of  size  m  >  k.  Fix  5  G  (0, 1/2],  and  suppose  A  satisfies 


A  >  max 


( 


32  m 


2m 


Dk 

T 


\6p(l-p) 

It  follows  from  Theorem  13  by  taking  Z  =  X  in  the  definition  of  'R.'D0(Z.  -y) 
(Definition  12)  and  from  the  monotonicity  of  7Z0(D,A)  with  respect  to  A  that 
there  exist  R  e  M+  and  7  e  T0  such  that,  for  all  subsets  JC  of  size  k, 

Dk  +  5  >  E[dx( X,XK)},  and 

kR  +  5  >  klZ0(D,  A)  +  <5  (3.4) 

>  /(Ah  Uk\T)  +  Y  Ui\X,  W,  T). 
ieic 
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From  (3.2)  and  (3.4),  it  follows  that 

I(X-,Vk\T)  ,  1 


k 


f-J]/(yi;C/i|x,ty,T) 

(1  -  Dk) 


k 

< 


k  +^m+-k- 


(3.5) 


Now  by  the  data  processing  inequality,  /(X ;  U^T)  =  /(X;  Ujc,  T)  >  I(X ;  Xjc). 


Let  £  =  1(X  •  X^  =  —  1).  We  then  have 

-^(X;  U/clT1)  >  H(X)  —  H(X\Xjc) 

=  l-H{X,e \XK) 

=  l-H{e\XK)-H{X\e,XK) 

>  1  -  /i(£>fc/A)  -  Pr(Xyc  =  0) 

>  (1  -  Dk)  -  h{5). 

Using  this  and  (3.5),  we  can  upper  bound  |  YhieK  7 (Xo  Ui\X,  W,T)  as 

i  YiI<Xi<Ui\x,W,T)  <  g(Dl)  +  +  t.  (3.6) 

iev 

We  will  now  show 

- E  JX; x T)  <  ^(4)  +  +  j,- m  ^ k-  (3-7) 

Suppose  the  U%  are  ordered  according  to  the  mutual  informations  / (Y);  t/t|X,  IT.  T), 
z.e.,  we  have  an  ordered  list  of  messages  U\, . . . ,  Um  in  which,  for  all  i,j  e 
{1, . . . ,  m},  Ui  and  U,  are  such  that  I(Y,:  U|X,  W,  T)  <  I(Yj ;  t/y|X,  IT,  T)  when 
i  <  j.  The  last  k  elements  of  this  list,  Um_k+ 1, . . . ,  Um,  must  satisfy  (3.6),  i.e., 

5  E  l(y.',U,\Y„,W,T)<g(Dl)+hY  +  j.  (3.8) 

i=m— fc+1 

All  other  elements  in  the  list  yield  equal  or  strictly  smaller  mutual  informations. 
Therefore,  if  we  average  over  a  larger  subset  of  messages,  the  average  will  never 
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increase.  We  thus  have 

1  m  1  m 

-J2l(Yi-,Ui\X,W,T)<-  V  I(Yi,  Ui\X,  W,  T). 

Tfl  .  rC 

1=1  i=m-k-\-l 

Using  this  and  (3.8),  we  obtain  (3.7).  Define 

(Dk  -  C (Dt,S))i  =  g-'  (g(D*)  +  ^ 

for  some  continuous  (  >  0  satisfying  ((Dk,  0)  =  0.  We  then  have 

-  m 

-  Y,  Ui\X,  W,  T)  <  g{{Dk  -  C(Dk,  <#).  (3.9) 

i=  1 

From  (3.9),  we  obtain,  by  using  Lemma  2, 

Dm  >  (Dk  -  C(Dk,  5))*  -  £(Dm,5) 

for  some  continuous  £  >  0  satisfying  £(Dm,  0)  =  0.  The  proof  is  completed  by 
letting  A  — >  oo  and  then  8  —>  0.  □ 


3.3  Suboptimality  in  the  Asymmetric  Case 

In  the  previous  section,  we  considered  symmetric  peers  and  showed  that 
the  coding  scheme  described  in  Section  3.1  provides  a  Pareto  optimal  delay- 
reconstruction  tradeoff.  If  we  consider  asymmetric  encoder  observations,  i.e., 
the  binary  erasure  probabilities  p,  of  the  channels  from  X  to  Yt  are  not  identi¬ 
cal,  then  it  becomes  natural  that  encoders  encode  at  different  rates,  since  some 
encoders  (with  smaller  p,)  will  have  a  better  knowledge  of  the  source. 

We  now  consider  a  very  simple  asymmetric  case  with  two  encoders  and 
show  that  the  achievable  scheme  is  no  longer  optimal;  more  precisely,  the  choice 
of  an  erasure  test  channel  is  no  longer  optimal.  Encoder  1  observes  the  binary 
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source  X  directly  (i.e.,  p\  =  0),  while  Encoder  2  observes  an  erased  version  Y 
of  the  source  with  p2  =  p  >  0.  Both  encoders  transmit  messages  to  a  decoder, 
which  then  attempts  to  reconstruct  X  upon  reception  of  both  messages.  This 
setup  is  referred  to  as  a  one-helper  problem  (Figure  3.5),  and  the  two  encoders. 
Encoders  1  and  2,  are  referred  to  as  the  main  encoder  and  the  helper,  respec¬ 
tively.  The  goal  is  to  characterize  the  tradeoff  between  the  rate  of  the  main  en¬ 
coder,  Rlr  the  rate  of  the  helper,  R2/  and  the  resulting  distortion. 


Figure  3.5:  The  erasure  one-helper  problem 

Before  showing  that  erasure  test  channels  are  suboptimal  for  this  problem, 
it  is  worth  mentioning  why  this  suboptimality  is  unexpected.  Existing  results 
in  distributed  rate-distortion  theory  suggest  a  connection  between  binary  era¬ 
sure  problems  and  their  quadratic  Gaussian  counterparts.  For  instance,  for  the 
Wyner-Ziv  problem,  both  instances  have  no  rate  loss  [51],  and  this  is  shown  us¬ 
ing  erasure  and  Gaussian  test  channels,  respectively.  Similarly,  the  only  two  in¬ 
stances  of  the  CEO  problem  for  which  conclusive  results  are  available  at  all  rates 
are  the  erasure  [44]  and  Gaussian  [50]  ones,  and  again  the  optimal  schemes  use 
erasure  and  Gaussian  test  channels,  respectively.  For  the  quadratic  Gaussian 
version  of  the  one-helper  problem  [52],  Gaussian  test  channels  are  known  to 
achieve  the  entire  rate  region.  This  suggests  that  erasure  test  channels  might 
be  optimal  for  the  erasure  version,  yet  we  shall  see  that  they  are  not  in  general, 
even  if  the  decoder's  goal  is  to  reproduce  X  losslessly. 
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Y  Vb(E) 


1-E 


Figure  3.6:  A  family  of  symmetric  test  channels 

Specifically,  we  show  that  for  some  rate  constraints  R2  on  the  helper,  the 
alternate  family  of  test  channels  14(e)  depicted  in  Figure  3.6  meet  the  helper's 
rate  constraint  while  allowing  the  primary  encoder  to  use  less  rate.  The  opti¬ 
mal  test  channel  for  the  lossless  one-helper  problem,  given  a  rate  constraint  R2 
on  the  helper,  is  given  by  the  optimal  solution  to  the  following  optimization 
problem  [33]: 

min  H(X\V)  (3.10) 

p{v\y) 

s.t.  I(Y ;  V)  <  R2 

X  +*  1"  ++  V. 

If  we  restrict  the  minimization  to  the  family  of  channels  14(e)  and  the  class  of 
erasure  channels  Ve(q),  then  it  suffices  to  show  that  given  a  rate  constraint  R2  on 
the  helper,  the  optimal  H(X\Vb)  is  smaller  than  the  optimal  H( X\Ve).  Figure  3.7 
depicts  the  optimal  H(X |14)  and  H(X\Ve)  against  R2.  Notice  that  for  low  values 
of  R2,  H(X 1 14)  is  lower  than  H(X \Ve),  signifying  that  erasure  test  channels  are 
the  worse  of  the  two  families  of  channels. 

The  superiority  of  the  14(e)  test  channel  can  be  understood  as  follows. 
Define  a  Bernoulli  random  variable  E  such  that  E  =  1  when  Y  is  erased 
and  E  =  0  when  Y  is  not  erased.  Since  A  is  a  function  of  Y ,  we  have 
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Figure  3.7:  Plot  of  H(X \Vb)  (solid)  and  H(X\Ve)  (dashed)  against  R2  for  p  =  0.1. 
For  low  values  of  R2,  H{X\Vb)  is  smaller  than  H(X \Ve). 

I(Y ;  Vb(e))  =  I(Y,  E-  14(e))  =  I(E;  Vb(e))  +  I(Y ;  Vb(e)\ E).  Likewise,  I(Y ;  Ve(q))  = 
I(E ;  14(g))  +  /(F;  14(g)  |S).  Now  I(E;  14(e))  =  0,  i.e.,  14(e)  communicates  no  in¬ 
formation  about  whether  Y  is  erased.  In  contrast,  I(E;  14(g))  >  0,  i.e.,  erasure 
test  channels  expend  positive  rate  transmitting  information  about  the  location 
of  erasures  in  Yl.  This  information  is  not  pertinent  to  the  problem  of  recon¬ 
structing  X,  and  is  therefore  wasteful.  Of  course,  when  e  >  0,  X  can  never  be 
determined  with  certainty  from  the  output  of  the  14(e)  channel.  If  the  goal  is 
to  reproduce  X1  from  the  helper's  codeword,  then  the  14(e)  would  be  a  poor 
choice.  Here,  however,  the  helper's  objective  is  simply  to  minimize  H(X\V). 

Thus  the  erasure  test  channel  is  suboptimal,  although  Figure  3.7  shows  that 
the  benefit  of  using  the  alternate  test  channel  14(e)  is  small.  Indeed,  numerical 
solution  to  (3.10)  for  various  problem  instances  suggest  that  erasure  test  chan¬ 
nels  are  very  nearly  optimal  and  are  therefore  sufficient  in  practice.  Showing 
this  rigorously  is  an  interesting  problem  for  future  work. 
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CHAPTER  4 

LOSSY  SOURCE  CODING  WITH  BYANTINE  ADVERSARIES 

4.1  Problem  Formulation 

Let  be  an  i.i.d.  source,  with  the  random  variables  Xt  taking  values  in  the 

(possibly  infinite)  alphabet  X.  There  are  n  encoders,  t  of  which  are  traitors,  that 
observe  X1  and  transmit  a  message  to  a  decoder,  which  then  attempts  to  recon¬ 
struct  X1  from  the  received  messages  up  to  a  specified  distortion.  The  traitors' 
goal  is  to  maximize  the  expected  distortion  in  the  decoder's  reconstruction,  and 
they  choose  their  messages  in  order  to  fulfill  this  goal,  with  full  knowledge  of 
X1,  the  other  n  —  1  messages,  and  the  decoder's  decoding  strategy  The  number 
of  traitors,  t,  is  known  to  all  the  encoders  and  the  decoder.  However,  their  loca¬ 
tion  among  the  n  encoders  (i.e.,  which  of  the  n  encoders  are  traitors)  is  unknown 
to  the  honest  encoders  and  the  decoder.  Moreover,  the  traitors  can  observe  X1 
and  then  decide  which  encoders  to  take  over.  The  traitors'  location  among  the 
n  encoders  and  their  actions  can  therefore  be  different  for  different  source  se¬ 
quences. 

Let  X  denote  the  reconstruction  space,  with  an  associated  distortion  measure 
d  :  X  x  X  — >  TR.  Let  A f  —  (1, . . . ,  n}.  A  code  (/i,  g)  is  a  collection  of  en¬ 

coders  fi  :  X1  — y  (1, . . . ,  XIp},  i  G  Af,  and  a  decoder  g  :  nr=i(l>  •  ■  •  >  Af®}  — >  X1. 
A  rate-distortion  vector  (f?x, . . . ,  Rn,  D )  is  said  to  be  achievable  if  for  all  suffi¬ 
ciently  large  /,  there  exist  encoders  fi  and  a  decoder  g  such  that 

Ri  >  y  log  Mf  ]  for  all  i,  and 
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D  >  E 


1 

max  max  - 

HCAf  Cffc  l 
\H\=n—t 


6  6 


l 

Y,d(Xt,g(Cu. 

t=  1 


where  C, 


M  X1)  Vi  e  11. 


Let  TZV  denote  the  set  of  achievable  rate-distortion  vectors,  and  let  7 ZV  denote 
its  closure.  Moreover,  let  R(-)  denote  Shannon's  rate-distortion  function. 

Definition  13.  TZV*  =  {(Ri, _ ,  Rn,  D)  :  \/S  C  J\f,  |«S|  =  n  —  2t ,  ^igi5  Ri  > 

R(D)}. 


Note  that  TZV*  is  the  factor-of-2  region.  The  following  theorem,  proved  in 
the  next  section,  shows  that  TZV*  is  achievable. 

Theorem  14.  Suppose  there  exists  a  reconstruction  sequence  Xl0  e  X’  such  that 
d(xl,X o)  is  finite  for  all  xl  e  T/zcn  TZV*  c  7^.77. 


4.2  A  Separation-based  Achievability  Scheme 

The  achievability  scheme  we  present  in  order  to  prove  Theorem  14  consists  of 
two  stages:  rate-distortion  quantization  and  adversarial  error  correction.  Our 
coding  scheme  separates  the  lossy  source  coding  part  of  the  problem  from  the 
adversarial  error  correction  part.  The  lossy  source  coding  part  is  taken  care  of  in 
the  first  stage.  The  second  stage  deals  with  adversarial  error  correction,  treating 
the  quantized  sequences  generated  in  the  first  stage  as  a  message  to  be  transmit¬ 
ted  over  a  channel  with  adversarial  errors.  The  first  stage  corresponds  to  source 
coding  (rate-distortion  quantization)  and  the  second  stage  corresponds  to  chan¬ 
nel  coding  for  transmitting  the  quantized  sequences  from  the  first  step  over  the 
non-stochastic,  packetized,  adversarial  channel  depicted  in  Figure  4.1,  where 
the  original  message  W  is  transmitted  to  the  decoder  in  the  form  of  n  packets. 
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t  of  which  are  corrupted  by  traitors.  Source-channel  separation  dictates  that  re¬ 
liable  communication  can  occur  as  long  as  R(D )  <  C,  where  C  is  the  capacity 
of  the  channel.  In  Section  4.6  we  show  that  the  capacity  of  the  channel  shown 
in  Figure  4.1  is  in  fact  minS)5=|n_2t|  Ylies  •  With  source-channel  separation,  re¬ 
liable  communication  can  occur  as  long  as  R{D)  <  min5)5=|n_2t|  ]C,g5,  which  is 
the  statement  of  Theorem  14. 


Figure  4.1:  A  non-stochastic,  packetized  adversarial  channel  where  the  original 
message  is  transmitted  as  n  packets,  t  of  which  are  corrupted  by  traitors. 

Proof  of  Theorem  14.  Choose  e  >  0,  5  >  0,  and  0  <  a  <  (n  —  2 t)e.  Given  the 
source  distribution  p(x),  fix  p(x\x)  such  that  I(X;X)  =  R(D).  Compute  p(x)  = 

J2xp(,x)p(x\x)- 

Rate-distortion  quantization:  Fix  a  blocklength  l,  and  generate  a  codebook 
C  consisting  of  2(-lR(D')+a'>  sequences  X1  drawn  randomly  and  i.i.d.  from  the 
marginal  distribution  p(x).  Index  the  sequences  in  C  by  w  G  {1, . . . ,  2^^+“)}. 

Random  binning:  For  all  i  e  J\f ,  Encoder  i  bins  the  2^R<yD^+cR>  sequences  in  C 
uniformly  and  independently  into  2l^Ri+e>)  bins. 

Encoding:  Observe  a  length-/  source  sequence  X1  and  find  a  w  such  that 
(X1 ,  Xl(w))  are  distortion  typical  [33,  p.  319].  If  there  is  more  than  one  such 
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w,  pick  w  to  be  the  least  one.  If  there  is  no  such  w,  set  w  —  1.  Let  bi  =  fi{Xl{w)) 
be  the  bin  index  of  Xl(w)  at  Encoder  i.  Encoder  i  transmits  bi  to  the  decoder. 

Decoding:  For  each  set  of  n  —  t  messages,  the  decoder  attempts  to  generate  a 
reconstruction  of  X1.  In  particular,  for  II  C  J\f,  \H\  —n  —  t,  the  decoder  searches 
the  bins  indexed  by  bu  i  e  II,  for  a  sequence  XlH  such  that  /,( XlH)  =  b,  for  all 
i  G  H.  If  there  is  exactly  one  such  sequence  XlH  in  the  bins  indexed  by  bt,  i  e  H, 
set  XlH  to  be  the  reconstruction  for  the  set  H.  If  there  is  no  such  sequence,  or 
there  is  more  than  one  such  sequence,  set  XlH  =  0. 

Consider  now  the  (”)  sequences  XlH  the  decoder  generates  for  H  c  J\f,  \H\  = 
n  —  t.  If  there  exists  exactly  one  sequence  X1  such  that  XlH  =  X1  for  all  XlH  ^  0, 
output  X1  as  the  reconstruction  of  X1.  If  XlH  =  0  for  all  //,  or  if  XlHi  ^  X\lt  for 
some  H\ ,  H2  C  A f,  output  Xl0  as  the  reconstruction. 

Error  analysis:  There  is  at  least  one  set  EL  of  n  —  t  encoders  that  are  all  hon¬ 
est.  By  virtue  of  the  encoding  strategy,  there  is  guaranteed  to  be  at  least  one 
sequence  common  to  all  the  bins  indexed  by  bi,  i  G  EL.  If  there  is  only  one 
such  sequence  (and  this  would  be  the  true  quantized  sequence  X1),  the  decoder 
would  output  this  sequence  as  the  reconstruction  for  EL.  If,  however,  there  is 
more  than  one  sequence  common  to  all  the  bins,  then  the  decoder  would  set 
XlH  =  0.  Define  the  error  event 

Fh  :  XlH  ±  0  and  XlH  ±  X\  VH  e  M,\H\  =  n  -  t. 

Define  E  =  {Xln  =  0}  U  (\JH  FH).  Observe  now  that  since  there  are  t  traitors, 
any  set  H  of  size  n  —  t  has  at  least  n  —  2 1  honest  encoders.  Denote  the  honest 
encoders  in  H  by  Sh-  Note  that  the  encoders  in  ,5V/  will  send  bin  indices  cor¬ 
responding  to  the  true  quantized  sequence  X1.  Denote  by  ESh  the  event  that 
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the  bins  reported  by  Sh  contain  more  than  one  common  sequence.  Then  EsH 
is  the  event  that  the  bins  reported  by  Sh  contain  exactly  one  common  sequence 
(which  would  be  X1).  If  EcSh  occurs,  then  no  matter  what  the  traitors  S'u  fl  // 
do,  the  decoder  will  output  either  X1  or  0  for  H  (this  is  because  if  the  traitors 
choose  to  send  the  bin  indices  for  X1,  then  the  decoder  would  find  X1  in  all  the 
bins  in  H,  and  therefore  output  X1;  if  however,  the  traitors  choose  to  send  bin 
indices  for  a  different  sequence,  then  the  decoder  would  not  find  that  sequence 
in  at  least  one  of  the  bins  reported  by  the  honest  encoders  in  Sh,  and  will  there¬ 
fore  output  0).  Thus  it  is  evident  that  FH  will  occur  only  if  ESh  occurs.  Hence 
Pt(Fh)  <  Pr (ESh). 

Now  let  X1))  denote  the  preimage  of  the  message  that  the  ith  encoder 

sends  for  X1.  Define  E's  =  \  f)ieS  f^1  (fi(Xl))\  ^  1.  Thus  E's  is  the  event  that 
the  bins  corresponding  to  the  messages  sent  by  the  encoders  in  S  do  not  contain 
exactly  one  common  sequence.  Notice  that  ESh  C  (J5 ,5,=n_2t  E's  for  all  H,  and 
therefore  E  C  Us,|s|=n-2t  Es- 

Suppose  now  that  for  any  set  S  of  n  —  2 t  encoders,  Ylies  Ei  >  F(D).  We 
bound  Pr (E)  as  follows.  By  the  union  bound, 

Pr(B)<  ^  Pr(Sy 

S : 

\S\=n—2t 

=  Pr  {3XleC,Xl^Xl:fi(Xl)  =  fi{Xl)^ieS) 

S: 

\S\=n—2t 

=  Ep(c')  E  P^XleC,Xl^Xl:fl(Xl)  =  fl(Xl)^eS\C  =  C) 

C  S : 

\S\=n—2t 

=  Y1p(C)H'P(x1)  E  Pr(3XleC,Xl^xl:fi(Xl)=fi(xl)WieS\Xl  =  xl,C  =  C) 
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S5>(C)5>(X')  E  X)Pr(/i(i')  =  /i(i')VieS|X‘  =  i',C  =  C) 


c 


S:  xl£C 

\S\=n-2t 


< 


5>(C)5>(*‘)  X  |C|2-,s:'« 


(i?i+e) 


c 


S : 

|S|=n— 2t 


E?(C)Ep(P  E  2,<r>d>+">2-,£« 


s(R;+e) 


C 


S: 

\S\=n—2t 


<J2p(c)J2p(xl'>  Y1  2 


—l((n—2t)e—a) 


C 


S: 

\S\=n—2t 


Ep(C)Ep(x')(o,)2_'<<’"2‘)"“) 


£7  a; 

^  ^  <2—l((n,—2t)e—a) 

2t  ' 


2 t 


where  the  last  inequality  follows  because  ^2ieS  Ri  >  R(D).  Notice  now  that 
if  Ec  occurs,  then  the  decoder  outputs  the  true  quantized  sequence  X1  for  R, 
and  for  every  H,  H  ^  W,  the  decoder  either  outputs  X1  or  0.  Thus  the  decoder 
outputs  X1  as  the  reconstruction  of  X1.  If,  however,  E  occurs,  then  the  decoder 
reconstructs  the  wrong  quantized  sequence.  Let  l  be  sufficiently  large  so  that, 
by  the  rate-distortion  theorem,  the  distortion  when  Ec  occurs  is  less  than  D  +  5 
when  averaged  over  X1  and  C.  We  have 


E/,9EX 


l 


max  max  - 
HcM  Chc  l 
\H\=n—t  t= 1 


^2d(xt,xt 


<(D  +  5)(  1  -  Pr (E))  +  dmax  Pr (E) 
<D  +  S  +  dm^ 


The  right  hand  side  can  be  made  smaller  than  D  +  e  by  letting  /  — y  oo  and  then 
a,  8  — >  0.  Thus  there  exists  a  code  that  achieves  (R\  +  e, . . . ,  Rn  +  e,  D  +  e).  □ 
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4.3  Converse  for  Uniform  Binary  Sources  with  Hamming  Dis¬ 
tortion 

In  this  section,  we  prove  that  the  achievability  scheme  in  Section  4.2  is  optimal 
for  a  uniform  binary  source  with  Hamming  distortion.  For  Hamming  distortion, 
given  the  binary  source  alphabet  X  =  {+,  — },  the  reconstruction  space  X  —  X  — 
{+,—}.  The  Hamming  distortion  measure  d  \  X  x  X  — >■  {0, 1}  is  given  by 

{0  if  x  =  x 
1  otherwise. 

Theorem  15.  For  a  uniform  binary  source  and  Hamming  distortion  measure,  TZV*  = 
7 W. 

Given  a  target  distortion  D  for  some  rate  vector  in  TZV* ,  the  Shannon  rate- 
distortion  function  for  a  uniform  binary  source  with  Hamming  distortion  is 
given  by  R{D)  =  1  —  h{D),  where  h(-)  is  the  binary  entropy  function.  There¬ 
fore,  in  order  to  prove  Theorem  15,  we  need  to  show  that  for  any  subset  of 
encoders  S  of  size  n  —  2 1,  D  >  h~1(  1  —  Rfj-  Before  proving  Theorem  15, 
however,  we  shall  state  and  prove  a  lemma  which  provides  an  upper  bound  on 
the  number  of  binary  strings  of  length  i  such  that  any  two  strings  differ  in  at 
most  2 £5  places,  where  5  <  1/2.  In  proving  this  lemma,  we  shall  make  use  of  a 
result  of  Kleitman  [66],  earlier  conjectured  by  Paul  Erdos:  the  number  of  binary 
strings  of  length  i  such  that  any  two  strings  differ  in  at  most  2k  places  is  at  most 

Lemma  3.  For  any  set  S  of  binary  strings  of  length  t,  where  \S\  >  2,  there  exists  a  pair 
of  strings  in  S  that  differ  in  at  least  (flTh-1  Q  log(|«S|  —  1))  —  l)  places,  where  /r-1  is 
the  inverse  of  the  binary  entropy  function. 


61 


72 


Proof.  Let  5  —  h  1  (|  log(|«5|  —  1)).  Thus  5  <  1/2  and  |«S|  —  1  =  2eh<-5\  We  have 


|S|>|S|-1 

=  (\S\-l)(6  +  l-6)e 

i= 0  w 

/p\ 

>(isi-i)E  , hi(i 

i= o  '  ' 

{  /p\ 

=  (|5|-l)^(i)(1-<5) 


i= 0 

m 

i= 0 

L«J 


(isi-i)^Qi“d-r 


1-5 

5 

1-5 


18 


i= 0 

m 


(I-SI-DE  L  2 


i=0 


,~eh(s) 


m 

EC)- 


*=o 


By  the  aforementioned  result  in  [66],  Xll=o  (!)  is  the  maximum  number  of  binary 
strings  such  that  any  two  strings  differ  in  at  most  2[C5J  places.  Since  |«S|  > 
(f)/  there  must  exist  a  pair  of  strings  in  S  that  differ  in  at  least  2  |_C5J  +  1  > 
2  (£S  —  1)  +  1  =  2  dS  —  1  places.  □ 


We  are  now  in  a  position  to  prove  Theorem  15.  Let  (/i, . . . ,  fn,  g)  be  a  code 

that  achieves  the  rate-distortion  vector  (Ru  . . . ,  Rn,  D ),  and  let  S  =  {1, _ ,  n  — 

2 1}.  Define,  for  any  given  source  sequence  xl,  the  set  A1(V)  =  {x1  e  X1  :  f,  (x1 )  = 
fi(xl )  Vi  e  5}.  Let  MCs,  cs  e  {1, . . . ,  2l^xs  be  the  values  taken  by  the  set 
A4(Xr).  Thus  MC5  is  the  pre-image  of  the  set  of  codewords  c5.  Since  there  are 
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2l  T,ieS  Ri  sets  of  codewords  covering  2l  sequences,  we  have 

2l  , 


2  lT,iesRi 


E  w 


CS  =  1 


csl  2  l^iesRi 


—  2 Ki-Eies  R*). 


Suppose  that  the  set  5  contains  honest  encoders  only,  and  the  traitors  con¬ 
stitute  either  the  set  of  encoders  71  =  (n  —  2t  +  1, . . . ,  n  —  t)  or  the  set  71  = 
{n  —  t  +  1, . . . ,  n}.  Suppose  further  that  ( x ')l  is  the  observed  source  sequence, 
and  the  encoders  in  S  send  codewords  c's  to  the  decoder.  Thus  (x1)1  e  Mcr .  Since 
there  are  |MC/J  sequences  in  ATy,  Lemma  3  tells  us  that  there  exists  a  sequence 
(. x")1  E  Mc's  such  that  d((x')1 ,  (. x ")l)  >215  —  2  bits,  where  5  =  h~x(j  log  Mrjs  —  1). 
Suppose  71  is  the  honest  set,  and  the  encoders  in  71  send  the  codewords  corre¬ 
sponding  to  (x')1.  Then  the  set  71  of  traitors  could  send  codewords  correspond¬ 
ing  to  the  fake  sequence  (x")1.  Thus  the  decoder  would  receive  the  set  of  mes¬ 
sages  (c^,  cr^x'),  cr2(x ")).  Note,  however,  that  the  same  set  of  messages  would 
be  received  by  the  decoder  if  (x")1  were  the  true  source  sequence  and  71  rather 
than  71  were  the  traitorous  set,  and  the  traitors  decided  to  report  (x1)1  to  the  de¬ 
coder.  In  either  case,  the  decoder  must  output  the  same  reconstruction,  say  xl, 
since  it  cannot  distinguish  between  the  two  cases.  We  thus  have  the  following 
sequence  of  inequalities: 

i 

max  E  d(xt,xt ) 

xl&M  ,  1  1,2  Ti  t= 1 
cs 


l 


>  max 
cr2 


d(x'f 

t= i 


i 


+  max 
cn 


> 

t= i 


xt 


E 

xleM  / 

c«s 

xl^(x')1  ,(x")1 


l 

max  max  E  d(Xt,Xt) 

1  ’  Ti  t=i 


i 

>  J2d(x't,xt) 

t= i 


E 

i=i 


d(x”,xt )  + 


i  Y 

max  max  -  E  d(Xt,Xt) 

’  Ti  1  t= i 


XlY^(z'Y  i(x"Y 
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>J2d(x't,x")+ 


1 


l 


t= 1 


max  max  - 
*=1,2  CTi  l 

xl£M  /  c 

C«S 


J2d(xti*t) 


>  21  h 


-l 


y  log  |  Mc's  |  -  1  )  -2  + 


E 


1 


z 


max  max  - 
*=1,2  CTi  l 

rGM  /  c  1 

C«S 

xl^(xf)1  ,(x")1 


J2d(Xt,Xt), 


where  the  penultimate  inequality  follows  from  the  triangle  inequality.  We  can 
now  remove  ( x ')l  and  (x")1  from  Mc's  and  apply  Lemma  1  to  the  remaining 
\Mc's\  —  2  sequences.  We  can  do  this  iteratively,  stopping  when  3  or  fewer  se¬ 
quences  remain.  This  yields  the  lower  bound 


i 

max  max  E  d(xt,xt)  > 

xl&M  ,  1  1,2  Ti  1=1 
c«s 


1-2  k) 


> 


I  Mc's  I  —  2 

E 


j= o 


Let  N  =  EtlrR\\Mcs\  ~  1)  =  2}  -  2lXiesRi.  Now 


E 


x 


max  max  - 

H  CHc  l 


J2d(XtiXt) 


t=  i 


>  Ea 


max  max  - 
*=1,2  CT.  I 


J2d(xt,xt) 


t= i 


i 


2l  ^ieS  Ri  ^ 

E>  max  max  - 
*=1,2  Cr  l 
cs=l  xl£Mcs  *  1=1 

2 Ri  |-ATC^|—  2 


^ d(xt,xt)p(xl 


> 


(a) 

>  h 


CS  =  1  7=' 

2l  Ri  \MCg\—<2 


N 

N 


2 Ri  |  |  2 


—  1 


(6) 

>  h-1 


E  E  iog(|Mcsi-i-j).-  2-a-  n  E  r2_‘ 


C5=l  3=0 

2l  ^>i£S 

*  E 

C5=l 


cs=l  3=0 


i\MCs\ ~  1)  ln(|MCs|  -  1)  -  ( | MCs |  -  1)^  U  2„ZjV  _  1  2_iAr 


In  2 


AT 
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=  h 


-i 


Gl^ieS  Ri  2l^ieSRi 

E  (i*f«i-i)iog(|Mcsi-i)-  e 


\Mcs\  1  I  .1  I  2~lN  -  -2~lN 


cs  =  1 
2;  Eigs  Ri 


cs= 1 

1 


In  2 


TV 


>^‘  |jv  E  (|MJ  -  1)  log(|AfJ  -  1)  -  —  2-'jV  -  T 


1 


CS  =  1 


/  In  2 


>  h~l  |  —2 lZiesRi 

IN 


2l  ^i£<S 


2} 


2l  Eies  Ri 


Y  (I^l-1)  loS  ^7 


C5  =  l 


21  Eies  Ri 


E  (i^J-i) 


C5  =  l 


N 


=  h  1  I  -^—2l^,ieS Ri - — - log 

IN  2 l^iesRi  \2l^iesRi)  l  In  2 


2_,7V  -  y 


=  /7”1  fy  l0g(2/(1-^5^)  -  1)  -  J—  \  (1  -  1 

\  i  l  HI  ^  J  v 


where  (a)  follows  from  the  convexity  of  h~l{x )  in  x,  (b)  follows  from  the  fact 
that  Y1T=i  l11  x  >  mlnm  —  m  and  because  h~1(x)  is  nondecreasing  in  x,  and  (c) 
follows  from  the  convexity  of  a;  log  a;  in  x  and  because  h~l(x)  is  nondecreasing 
in  x.  Letting  l  — >•  00  completes  the  proof. 


4.4  Converse  for  Gaussian  Sources  with  Squared  Error  Distor¬ 
tion 

In  this  section,  we  prove  that  the  achievability  scheme  in  Section  4.2  is  optimal 
for  a  Gaussian  source  with  squared  error  distortion.  The  squared  error  distor¬ 
tion  measure  d  :  R  x  R  — >  R+  is  given  by  d(x ,  x)  =  (x  —  x)2. 

Theorem  16.  For  a  Gaussian  source  and  squared  error  distortion  measure,  if  there 
exists  a  reconstruction  symbol  X  such  that  E[d(X,  X)\  is  finite,  then  TZV*  =  1ZV. 

Given  a  target  distortion  D,  the  Shannon  rate-distortion  function  for  a  Gaus- 


1 

/  In  2 
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sian  source  with  squared  error  distortion  is  given  by  R(D)  =  |  log  a2  jD,  where 
a2  is  the  variance  of  the  source.  Therefore,  in  order  to  prove  Theorem  16, 
we  need  to  show  that  for  any  subset  of  encoders  S  of  size  n  —  2 1,  R%  > 

\  log  a2/D.  Let  (f\ , . . . ,  /„.  g)  be  a  code  that  achieves  the  rate-distortion  vector 
(Ri, . . . ,  Rn,  D ),  and  let  C,  be  the  codeword  transmitted  by  the  ith  encoder.  For 
any  S  G  A f,  |«S|  —  n  —  2 1,  we  have 

£  R,  >  j  J2  H(Q) 

i£S  i£S 

>  ) H(Cs) 

>  \l(Xl-Cs) 

=  \h(X')  -  ]h(X‘\Cs) 

=  tlog27 recr2  -  jh(X‘\Cs). 

Thus,  in  order  to  prove  Theorem  16,  it  suffices  to  show  that  jh(Xl\Cs )  < 
|log27reH.  Let  Hi  and  H2  be  two  sets  in  A/”  such  that  \Hi\  =  \H2\  —  n  —  t 
and  Hi  (T  H2  =  S.  Define 

1 

Qd  =  {xl  :  max  max  -  >  d(Xt,Xt )  <  D}. 

i=l,2  CHc  l  f—' 

*  t=  1 

For  every  codeword  c5,  let  <5d|c5  =  {xl  G  Qn  :  /s  =  Gs}.  Then 

Pr(X'  G  Qd 0  =  Pr(C'<s  =  cs)  Pr(X;  G  Qzy  !<?$  =  cs) 
cs 

<  P r(Cs  =  cs)  Pr(Xz  G  QD’\cs\Cs  =  cs) 

cs 

=  Pr(X;  G  QD'\cs)-  (4.1) 


Since  the  code  achieves  distortion  D,  we  have 


D  >  E 


1 


i 


max  max  - 

HCAf  Cfjc  l 
\H\=n—t  t=  1 


Y,d{xuxt 
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>  E 


1 


i 


max  max  - 
<=1.2  CHc  l 

*  t= 1 


^2d(Xt,Xt 


Pr 


1 


i 


max  max  - 

<=1.2  CHc  l 

*  t=  1 


YJd{Xt,Xt)>D' 

t=  i 

>  /  Pr  [X*  £  Q/j'] 

Jo 

poo 

>  /  Pr  [**  £  Qd'\Cs ]  dD 


dD' 


(4.2) 


where  the  last  inequality  follows  from  (4.1). 


Let  D  =  infj/l'  :  X1  e  gn'|c5}-  Since  the  set  Qn'ic^  is  non-decreasing  in  D' , 
the  event  { X1  Qd'\cs}  is  identical  to  the  event  { D  >  I)'}.  Hence  from  (4.2), 

poo 

D>  Pr (D  >  D')dD'  =  E (D). 

Jo 

Fix  A  >  0  and  let  DA  =  A  |~£]  be  a  quantized  version  of  D.  Observe  that  since 
Da  A  D  +  A, 

E(Z?a)  <  E (D  +  A)  <  D  +  A.  (4.3) 

We  then  have 

)h(Xl\Cs)  =  jh(Xl\Cs,DA )  +  jI(X1-,Da\Cs) 

<  jh(Xl\Cs,DA )  +  jH(Da).  (4.4) 

Consider  the  first  term  in  (4.4).  Note  that  D  <  DA,  so  X1  e  Qda\cs-  Therefore, 
by  the  uniform  bound  on  entropy. 


h(Xl\Cs,DA)  <  EflogVol^JC*)].  (4.5) 


Now  consider  the  second  term  in  (4.4).  Since  DA  is  quantized,  it  can  be  shown 
using  a  maximum  entropy  distribution  result  that 


H{Da)  < 


EN. 

A 


E  (Da)J 


(4.6) 
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where  h(q)  =  —qlogq  —  (1  —  q)  log(l  —  q)  is  the  binary  entropy  function.  The 
right  hand  side  of  (4.6)  is  increasing  in  E(Da),  so  using  (4.3)  gives 


H(Da)  < 


D  +  A 


A 


-h 


A 


D  +  A 


(4.7) 


Now  consider  two  sequences  xl,  xn  e  Qd>\cs-  Suppose  the  decoder  receives 
the  set  of  codewords  (, cs,cHl\H2  =  fHl\H2(xl),cH2\Hl  =  fH2\H1(x'1)).  First  observe 
that  this  set  of  messages  could  have  been  produced  if  X1  =  xl  and  Hi  were 
the  set  of  honest  encoders.  Then  the  nodes  in  H2\H1,  which  are  all  traitors, 
could  send  cH2\Hl.  Since  xl  e  QD> \cs>  the  estimate  xl  must  by  definition  satisfy 
j d(xl,  xl)  <  D' .  However,  the  same  set  of  messages  could  have  been  produced  if 
X1  =  xn  and  H2  were  the  set  of  honest  encoders,  and  the  traitors  H\  \H2  decide 
to  send  cHi\h2 ■  Since  the  decoder  produces  just  one  estimate  for  a  given  set  of 
received  codewords,  the  very  same  estimate  xl,  by  the  same  reasoning,  must 
satisfy  jd(xn,  xl)  <  D' .  Hence  we  have 

~x(t))2  <  D' 

1  t= i 

~x(t))2  <  D'i 

1  t= i 

which  can  be  rewritten  as 

| \x  —  x\ |2  <  VlD' 

||x'  —  x\\2  <  VlD’. 

Therefore,  by  the  triangle  inequality,  for  any  xl ,  x’1  e  Qr)>\cs,  ||x  ^  x'\\2  <  2 VlD1. 
Thus  Qd’\cs  has  diameter  at  most  2 VlD’.  The  following  lemma  from  [67]  upper 
bounds  the  volume  of  subsets  of  R/  as  a  function  of  their  diameter. 

Lemma  4.  The  volume  of  any  subset  of  R/  is  no  more  than  that  of  the  l-ball  with  the 
same  diameter. 
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Lemma  4  tells  us  that  the  volume  of  Qd'\cs  is  no  more  than  the  volume  of 
an  l- ball  with  radius  y/lD'.  The  latter  can  be  shown  to  be  less  than  (2 neD')?. 
Combining  this  with  (4.5)  and  (4.7)  gives 


1 

I 


h(Xl\Cs)  <  )E[log(2«Di)5]  +  ( 

<  i  log(27reE[BA])  +  A  ^ 

<  i  log(27re(Z>  +  A))  +  ) 


A 


L>  +  A 
A 

D  +  A 
A 


D  +  A 


5 


where  the  penultimate  inequality  follows  from  the  concavity  of  logo;  in  x  and 
the  last  inequality  follows  from  (4.3).  Letting  l  — >  oo  and  then  A  — »  0  completes 
the  proof. 


4.5  Uniform  Binary  Sources  with  Erasure  Distortion 

In  this  section,  we  show  that  for  uniform  binary  sources  with  erasure  distor¬ 
tion,  the  factor-of-2  rule  is  pessimistic,  and  there  exists  a  coding  scheme  which 
can  achieve  points  outside  the  rate  region  proposed  in  Section  4.1.  We  will  con¬ 
sider  a  special  case  of  the  3-channel  Byzantine  multiple  descriptions  problem 
in  which  one  of  the  channels  transmits  at  rate  R  and  the  other  two  transmit  at 
rate  1.  One  of  the  three  encoders  is  a  traitor.  We  shall  henceforth  refer  to  this 
special  case  as  the  R  —  1  —  1  problem.  Assume  without  loss  of  generality  that 
Encoder  1  transmits  at  rate  R  and  Encoders  2  and  3  transmit  at  rate  1.  Thus 
Encoders  2  and  3  send  the  complete  source  sequence  X1  to  the  decoder,  since 
their  respective  channels  are  not  rate-constrained.  Given  the  source  alphabet 
X  =  {+,—},  define  the  reconstruction  space  X  =  ,  0},  where  0  denotes  the 
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erasure  symbol.  The  erasure  distortion  measure  is  given  by 


( 


0  if  x  =  x 


d{x,x)  =  <  i  if  x  =  0 


(4.8) 


|^oo  otherwise. 

Let  lZVeras  be  the  set  of  achievable  rate-distortion  pairs  as  defined  in  Section  4.1, 
and  let  lZVeras  denote  its  closure. 


Theorem  17.  7 ZVeras  =  {(R,  D)  :  D  >  2 h  X(1  —  R)},  where  h  1(-)  is  the  inverse  of 
the  binary  entropy  function  h. 


Proof  (Achiev ability)  Note  that  2h~1(l  —  R)  <  1  —  R,  which  is  the  distortion- 
rate  function  for  a  uniform  binary  source  with  erasure  distortion.  We  will  show 
that  for  any  D  and  any  R  >  1  —  h{D / 2)  (equivalently  D  >  2/r_1(l  —  R)),  the 
rate-distortion  pair  (R,  D)  is  achievable.  In  particular,  we  will  show  that  for  any 
e  >  0,  there  exists  a  code  with  rate  less  than  R  +  e  and  distortion  less  than  D  +  e. 
Define  D  =  D/2,  and  let  R  >  1  —  h(D).  Let  p(x \x)  denote  a  binary  symmetric 
channel  (BSC)  with  crossover  probability  D.  We  construct  an  encoder  similar 
to  a  rate-distortion  encoder  for  a  binary  symmetric  source  (BSS)  with  Hamming 
distortion. 

Random  codebook  generation:  Compute  p(x)  =  J2xp(x)p(x\x).  Fix  a  block- 
length  l,  and  generate  2lR  +  1  sequences  X1  drawn  randomly  and  i.i.d  from  the 
marginal  distribution  p(x).  Assign  each  codeword  an  index  w  G  {0, 1, . . . ,  2lR}. 
The  codebook  is  revealed  to  the  encoders  and  the  decoder. 

Encoding:  Choose  S  >  0.  Encoder  1  observes  a  length-/  source  sequence  X1, 
and  encodes  X1  by  w,  w  f  0,  if  X1  and  Xl{w)  are  jointly  typical,  i.e.,  the  Ham¬ 
ming  distance  between  X1  and  Xl[w)  is  less  than  l(D  +  S).  If  there  is  more  than 
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one  such  w,  the  smallest  is  used.  If  there  is  no  such  w  G  {1, . . . ,  2lR},  Encoder  1 
sends  w  —  0.  Since  IR  +  1  bits  are  required  to  describe  the  2lR  +  1  indices,  the 
rate  of  this  code  is  R  +  l/l.  Encoders  2  and  3  send  the  whole  sequence  X1. 

Decoding:  If  Encoders  2  and  3  send  the  same  source  sequence  X1,  then  X1 
is  the  true  source  sequence,  since  at  least  one  of  Encoders  2  and  3  is  honest. 
Output  X1  =  X1  as  the  reconstruction.  If  Encoders  2  and  3  send  different  source 
sequences,  then  one  of  them  is  the  traitor,  and  Encoder  1  is  honest.  In  this  case, 
if  Encoder  1  sent  w  —  0,  output  the  all  erasure  string.  Otherwise,  if  one  of  the 
sequences  sent  by  Encoders  2  and  3  is  not  jointly  typical  with  the  index  sent  by 
Encoder  1,  then  that  encoder  is  the  traitor.  Output  the  sequence  sent  by  the  other 
encoder  as  the  reconstruction.  If,  however.  Encoders'  2  and  3  sequences  are  both 
jointly  typical  with  the  index  sent  by  Encoder  1,  output  a  reconstruction  X1  such 
that  X1  has  the  same  value  for  the  bits  for  which  Encoders'  2  and  3  sequences 
agree,  and  has  erasures  for  the  bits  for  which  Encoders'  2  and  3  sequences  differ. 

Distortion  analysis:  Let  us  consider  the  possible  traitor  locations  and  actions 
for  each  source  sequence.  If  the  traitor  chooses  to  take  over  Encoder  1,  then 
Encoders  2  and  3  will  send  the  true  source  sequence  and  the  decoder  will  be 
able  to  decode  correctly  regardless  of  the  source  sequence.  Suppose  the  traitor 
takes  over  one  of  Encoder  2  or  Encoder  3.  Assume  without  loss  of  generality 
that  the  traitor  takes  over  Encoder  3.  Then  Encoder  2  will  send  the  true  source 
sequence,  and  Encoder  1  will  send  an  index  that  is  jointly  typical  with  the  true 
source  sequence.  If  the  traitor  sends  the  true  source  sequence,  the  decoder  will 
be  able  to  decode  correctly,  so  suppose  the  traitor  chooses  to  send  a  spurious 
sequence.  If  the  fake  sequence  sent  by  the  traitor  is  not  jointly  typical  with  the 
index  sent  by  Encoder  1,  the  decoder  will  output  the  sequence  sent  by  Encoder 
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2,  which  is  the  true  source  sequence.  Suppose  now  that  the  traitor  sends  a  source 
sequence  which  is  jointly  typical  with  the  index  sent  by  Encoder  1,  but  different 
from  the  true  source  sequence.  Then  the  decoder  will  output  a  partially  erased 
reconstruction  based  on  the  bits  that  are  common  between  the  true  sequence 
and  the  fake  sequence.  Thus,  for  any  source  sequence  for  which  Encoder  1 
transmits  an  index  w^O,  the  only  strategy  for  the  traitor  which  will  yield  non¬ 
zero  distortion  is  to  send  a  source  sequence  which  is  jointly  typical  with  the 
index  sent  by  Encoder  1,  but  different  from  the  true  source  sequence.  It  therefore 
makes  sense  for  the  traitor  to  pursue  this  strategy  for  every  source  sequence.  In 
this  case,  let  X2  and  Xl  be  the  true  and  fake  sequences  respectively.  Since  both 
Xl2  and  Xl  are  jointly  typical  with  Xl(w),  where  w  is  the  index  sent  by  Encoder 
1,  the  Flamming  distance  between  Xl(w)  and  Xl2  (and  Xl(w )  and  X-3)  is  at  most 
l(D  +  5).  By  the  triangle  inequality,  therefore,  the  Hamming  distance  between 
the  true  sequence  Xl2  and  the  fake  sequence  Xl  is  at  most  21  (D  +  5). 


Let  Pw  =  {x  e  X  :  the  encoder  transmits  the  index  w}.  We  thus  have 
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since,  when  w  —  0,  the  decoder  outputs  the  all-erasure  string  (which  yields 
distortion  1),  and  when  w  ^  0,  the  traitor  sends  a  fake  sequence  Xl  which  differs 
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in  at  most  2 l(D+5)  bits  from  the  true  source  sequence,  as  described  earlier.  Thus 
the  distortion  when  w  f  0  is  at  most  2  (D  +  5).  Recall  that  the  set  P0  (the  set  of 
sequences  for  which  the  encoder  transmits  w  =  0)  is  the  set  of  sequences  for 
which  no  typical  codeword  can  be  found.  Let  Pe  be  the  total  probability  of  these 
sequences.  The  total  probability  of  the  rest  of  the  sequences  can  be  bounded  by 
1.  We  thus  have 
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<  1  •  Pe  +  2{D  +  8)  •  1 
=  2D  +  25  +  Pe. 


Since  we  are  using  a  rate-distortion  code  of  rate  R  >  R{D)  =  1  —  h(D)  for  a  BSS 
with  Hamming  distortion,  Pe,  averaged  over  a  random  choice  of  codebooks,  can 
be  made  arbitrarily  small  as  l  — >  oo.  Therefore,  there  exists  a  sufficiently  large 
blocklength  l  such  that  l/l  <  e  and  2 5  +  Pe  <  e.  Thus  there  exists  a  code  with 
rate  R  +  l/l  <  R  +  e  and  average  distortion  less  than  2D  +  e  =  D  +  e.  □ 


We  will  now  prove  the  converse  to  Theorem  17. 


Proof.  ( Converse  to  Theorem  17)  Let  (f,g)  be  a  code  that  achieves  the  rate- 
distortion  pair  (R,  D).  For  this  code,  define,  for  any  given  source  sequence  xl , 
the  set  M(xl)  =  {xl  e  X1  :  f(xl)  =  f(x1)}.  Let  Mir  i  e  {1, . . . ,  2lR}  be  the  values 
taken  by  the  set  A4(Xl).  Thus  is  the  pre-image  of  the  ith  codeword.  Since 
there  are  2W  codewords  covering  2l  sequences,  we  have 

2m  i 

IMI  =  =  2'<1-S). 

i= 1 

Suppose  Encoder  3  is  the  traitor,  and  suppose  that  unless  the  pre-image  of  the 
codeword  sent  by  Encoder  1  contains  a  single  source  sequence  (in  which  case 
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Encoder  3  sends  that  source  sequence).  Encoder  3  always  sends  a  fake  source 
sequence  X1  which  is  in  the  pre-image  of  the  codeword  sent  by  Encoder  1  but 
is  different  from  the  sequence  X1  sent  by  Encoder  2  (which  is  the  true  source 
sequence).  The  true  and  fake  sequences  will  differ  in  at  least  one  bit.  The  situ¬ 
ation,  from  the  point  of  view  of  the  decoder,  is  identical  to  the  situation  where 
X1  is  the  true  source  sequence,  X1  is  the  fake  source  sequence,  and  Encoder  2, 
rather  than  Encoder  3,  is  the  traitor.  Since  the  distortion  is  maximized  over  all 
traitor  locations  and  actions,  the  decoder  cannot  output  either  +  or  —  for  any 
bit  in  which  X1  and  X1  differ,  since  outputting  either  would  result  in  infinite 
distortion  under  one  of  the  two  aforementioned  scenarios.  The  decoder,  there¬ 
fore,  must  output  an  erasure  for  any  bit  in  which  X1  and  X1  differ.  Given  a 
sequence  xl  E  Mif  let  DM.(xl)  be  the  maximum  Hamming  distance  of  xl  from 
any  sequence  in  M%.  In  the  extreme  case  that  M,  =  2l ,  i.e.,  all  sequences  map  to 
the  same  codeword,  the  traitor  will  be  able  to  find  a  source  sequence  differing 
in  /  bits  for  every  sequence.  Thus  DM.{xl)  =  /  for  all  xl ,  and  the  decoder  would 
be  forced  to  output  the  all  erasure  string  for  every  source  sequence,  resulting  in 
a  distortion  of  1. 

Now  suppose  M,  <  2l  for  all  i.  Since  there  are  Mr  source  sequences  in  Mu 
Lemma  3  tells  us  that  the  traitor  will  be  able  to  find  a  sequence  xlw  E  Mt  such 
that  DM.(xlw )  >  215  —  2  bits,  where  5  =  h~l(\  log  |M,|  —  1).  We  can  remove  xlw 
from  Mi  and  apply  Lemma  1  to  the  remaining  M,  —  1  sequences.  Doing  this 
iteratively  yields  the  lower  bound 
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Let  N  =  El=i(\Mi\  ~  1)  =  2'  -  2lR •  Now 
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where  (a)  follows  from  the  convexity  of  h~l{x )  in  x,  (b)  follows  from  the  fact 
that  Yd i  ^na:  —  ~  m  and  because  h~1(x)  is  nondecreasing  in  x,  and  (c) 

follows  from  the  convexity  of  a:  log  a;  in  x  and  because  h~1{x)  is  nondecreasing 
in  x.  Letting  l  — y  oo  completes  the  proof.  □ 
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4.6  Channel  Coding  Theorem 


In  this  section  we  make  precise  the  fact  that  separation  breaks  for  erasure  dis¬ 
tortion.  We  begin  by  proving  a  capacity  result  for  the  channel  depicted  in  Fig¬ 
ure  4.1.  A  set  of  messages,  indexed  by  {1, . . . ,  2lR}  is  to  be  transmitted  over 
the  channel.  Encoder  i  encodes  the  message  using  the  encoding  function  /,  : 
{1, . . . ,  2lR}  — »  (1, . . . ,  2lRi}.  If  i  e  H,  then  Encoder  is  codeword  is  C,  =  fi(W), 
where  W  is  the  message  to  be  transmitted.  If  i  e  Hc,  then  Encoder  i  can  choose 
Ct  arbitrarily,  with  full  knowledge  of  W  and  the  other  n  —  1  codewords.  The 
decoders  employs  the  decoding  function  g  :  ]J"=1{1,  •  •  • ,  2lRi}  — >  (1, . . . ,  2lR} 
produces  an  estimate  W  =  g(C\ , . . . ,  Cn)  of  the  original  message  W . 


For  the  message  w,  define  the  indicator  function 

{0  if  w  —  w 
1  if  w  ^  w 

given  that  the  set  of  honest  encoders  is  H  and  the  traitors  transmit  the  code¬ 
words  Ctr  .  For  the  code  (/,  g),  we  define  the  average  probability  of  error  as 

o  IR 

1  z 

pe(f,g)  =  ^R^2  s  f,g(™,H,cHC). 

w=l  H=\n—t\ 

Definition  14.  A  rate  R  is  said  to  be  achievable  if  there  exists  a  sequence  of  codes  (/,  g), 
indexed  by  the  blocklength  l,  such  that  Pe(f,  g)  — >  0  as  l  — >  oo. 


Theorem  18.  All  rates  R  such  that  R  <  min  ScAa  are  achievable.  If  R  > 

S=\n—2t\ 

min  scm  Sjgs/  then  Pe(f,  g)  — >  1  as  l  — >  oo. 

S=\n—2t\ 


The  proof  of  achievability  for  Theorem  18  is  very  similar  to  the  proof  of  The¬ 
orem  14  and  is  omitted.  We  prove  the  converse  below.  It  is  worth  noting  that 
Theorem  18  admits  a  strong  converse. 
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Converse  to  Theorem  18.  Suppose  R  >  YZiZi  R%  and  let  71  =  {n  —  2t+l, . . . ,  n  —  t} 
and  72  =  {n  —  t  +  1, ... ,  n}.  Fix  a  code  (/i, . . . ,  fn,  g)  and  consider  the  first 
n  —  2 1  encoding  functions  /i,  •  •  • ,  fn-2t-  Define  M (w)  to  be  the  set  of  source 
sequences  such  that  for  all  w'  G  M{w),  fi[w')  =  fi(w),  i  —  1, . . . ,  n  —  2t.  Note  that 
given  a  random  message  W,  the  range  of  the  random  variable  M(W)  partitions 
the  original  set  of  messages  {1, . . . ,  2lR}.  Since  the  sum  rate  of  the  first  n  —  2 1 
encoders  is  ZZZZf  R%>  the  number  of  partitions  is  2l  T.?=Z  Rt  _  We  can  therefore 
label  the  partitions  as  Mi,  M2, . . . ,  M^n- 2tR,.  For  each  partition  Mu  we  have 
the  following  two  cases: 

1.  there  exists  w  G  AR  such  that  if  the  encoders  in  71  transmit  fi(w)  for  all 
i  G  71,  then  the  decoder  outputs  W  =  w  regardless  of  the  messages  trans¬ 
mitted  by  encoders  in  71-  We  refer  to  w  as  a  "leader"  message. 

2.  for  all  w  G  Mir  there  exists  a  set  of  messages  Cu  j  G  To,  such  that  if  the 
encoders  in  71  transmit  fi(w)  for  all  /'  G  71  and  the  encoders  in  71  transmit 
Ci,  i  G  71,  then  the  decoder  outputs  W  ^  w. 

We  now  argue  that  for  any  message  in  Mi  that  is  not  a  leader  message,  the 
traitors  can  cause  the  decoder  to  make  an  error  by  outputting  a  different  mes¬ 
sage.  If  the  first  case  holds,  then  the  traitors  simply  have  to  take  over  the  en¬ 
coders  in  71  and  transmit  fi{w),  i  G  71  where  w  is  the  leader  message.  If  Case  2 
holds,  then  the  traitors  simply  have  to  take  over  71  and  transmit  the  messages 
that  would  result  in  an  error  at  the  decoder.  We  can  also  argue  that  if  there  is 
more  than  one  leader  message  in  Mir  then  the  traitors  can  cause  an  error  for  ev¬ 
ery  leader.  More  precisely,  if  w  and  w'  are  two  leader  messages,  then  the  traitors 
can  cause  an  error  for  w  by  taking  over  71  and  transmitting  fi(w'),  *  G  71.  If  there 
is  only  one  leader  in  Mu  then,  by  the  above  argument,  the  traitors  can  cause  er- 
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rors  for  all  messages  other  than  the  leader.  Therefore,  at  most  one  message  for 
every  partition  can  be  decoded  correctly.  Since  there  are  2l  ‘  partitions,  at 
most  can  be  correctly  decoded.  We  can  therefore  compute  the  proba¬ 

bility  of  error  as  follows: 


Pe{f,g ) 


1  2iR 

-rj-  >  max  max  1  f  0  ( w ,  H ,  CH° 

2 lR  ^  hcim  cHC  f 
w= 1  H=\n—t\ 
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2^ 


E>  max  max  1  f  a(w,H,C 
^  hcm  cHC  J,y 

j= 1  wGMj  H=\n—t\ 


Hc 


>  -l—(2lR  —  Ri) 

=  i  _  2-'(R-sr=T2tKi) 


which  goes  to  1  as  l  — >  oo. 


□ 


Notice  now  that  according  to  Theorem  18,  the  capacity  of  the  corresponding 
channel  in  the  R—  1—1  problem  is  R.  For  source-channel  separation,  the  con¬ 
dition  R(D )  <  R  must  hold.  But  the  scheme  proposed  in  Section  4.5  achieves 
works  for  R{D)  >  R,  which  implies  that  the  channel  is  being  operated  above 
capacity.  Even  though  operating  the  channel  above  capacity  is  useless  from  the 
point  of  view  of  reliable  communication,  it  appears  to  be  beneficial  from  the 
point  of  view  of  rate-distortion.  Note  that  in  the  end  the  decoder  has  to  choose 
from  two  messages  only,  one  of  which  it  knows  is  the  correct  message.  This  is 
not  sufficient  for  reliable  communication,  since  the  decoder  cannot  unequivo¬ 
cally  determine  which  of  the  two  messages  is  correct.  However,  the  reduction 
to  two  messages,  one  of  which  is  correct  yields  benefits  from  the  point  of  view 
of  rate-distortion  since  the  two  messages  are  constrained  to  be  within  a  certain 
distortion-typical  set. 

It  is  instructive  to  consider  the  R  —  1  —  1  problem  for  the  Hamming  distortion 
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case.  Since  separation  is  optimal  in  the  Hamming  case,  we  have  R(D )  <  R  (cf. 
Theorem  14  and  Theorem  15).  Consider  now  the  R  —  R  —  R  problem,  i.e.,  all 
three  encoders  have  rate  R.  Theorems  14  and  15  tell  us  that  we  must  again  have 
R(D)  <  R.  Thus  in  the  Hamming  case,  the  R  —  1  —  1  and  R  —  R  —  R  problems 
have  the  same  rate  region.  This  signifies  that  in  the  R  — 1  —  1  Hamming  problem, 
the  extra  rate  available  to  the  encoder  from  Encoders  2  and  3  is  useless;  the 
same  distortion  could  be  achieved  if  Encoders  2  and  3  transmitted  at  the  lower 
rate  R.  This  is  because  it  is  optimal  for  the  decoder  to  output  the  quantized 
sequence  transmitted  by  Encoder  1  (which  is  the  centroid  of  the  corresponding 
Hamming  ball)  even  if  Encoders  2  and  3  send  the  complete  source  sequence  as 
in  the  R  —  1  —  1  problem.  The  adversarial  channel  reveals  a  lot  of  information 
to  the  decoder  through  the  source  sequence  that  the  traitor  chooses  to  transmit, 
since  the  decoder  eventually  receives  two  source  sequences,  one  of  which  is  the 
true  source  sequence.  However,  this  additional  information  is  not  useful  at  all 
since  all  the  decoder  needs  to  know  to  make  an  optimal  decision  is  Encoder  l's 
quantized  sequence. 

The  erasure  distortion  measure,  however,  is  more  stringent  than  the  Ham¬ 
ming  distortion  measure,  since  it  does  not  allow  the  decoder  to  make  errors  in  its 
reconstruction.  For  this  reason,  the  decoder  needs  to  be  absolutely  certain  about 
any  non-erased  bit  it  outputs  in  its  reconstruction  and  output  erasures  for  any 
bit  about  which  it  is  not  certain.  This  is  not  the  case  with  Hamming  distortion, 
since  the  decoder  can  always  guess  for  any  bit  about  which  it  is  uncertain.  In 
order  to  to  achieve  the  same  distortion,  the  erasure  distortion  measure  requires 
the  decoder  to  have  more  information  than  the  Hamming  distortion  measure. 
It  turns  out  that  the  information  revealed  by  the  adversarial  channel,  which  is 
useless  in  the  Hamming  case,  accounts  for  the  additional  information  required 
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in  the  erasure  case.  This  allows  Encoder  1  to  transmit  at  a  rate  lower  than  the 
erasure  rate-distortion  function  by  performing  Hamming  quantization  instead 
of  erasure  quantization,  with  the  remaining  information  being  supplied  to  the 
decoder  by  the  adversarial  channel. 
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APPENDIX  A 

CHAPTER  2:  PROOFS 


A.l  Preliminaries 

We  define  a  multi-variable  mutual  information  as  follows: 

IK(X1-X2;...-XK)  =  D  L(X1,...,XK)\\Y[p(Xi) 

\  i= i 

K 

i=  1 

Ea  particular,  h(X)  =  0.  The  multi-variable  mutual  information,  as  defined 
above,  is  a  measure  of  the  mutual  dependence  among  K  random  variables  and 
is  different  from  McGill's  multivariate  mutual  information  [41].  We  note  the 
following  properties  of  Ik(X p  X2] . . . ;  XK). 

1.  MX';...;X'r)>  0. 

2.  IK{ X1-...-XK)  >  Im\X  1  j  •  •  •  j  Xm)  ~\~I( K—m+\)  ( f  \X i ,  •  •  •  ,  X17l^ ,  Xm+i  j  •  •  •  ,  ], 

where  f(X1, . . . ,  Xm)  is  a  function  of  the  random  variables  Xi, ,  Xrn/ 
m  <  K. 

Remark:  This  property  holds  by  symmetry  for  the  general  case  when  /(•) 
is  a  function  of  any  size-m  subset  of  Xif . . . ,  Xk. 

Proof. 

IK{X1-...-XK) 

m  K 

=  J2H(Xi)+  H{Xi)-H{X1,...,Xm) 

i=  1  1 
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-H{Xm+1,...,XK \Xu...,Xm) 

K 

=  Jm(Xi;...;Xm)  +  Y  H(xi ) 

i=m-\- 1 

-i/(Xm+1,...,X*|X1,...,Xm) 

K 

=  Jm(Xi;...;Xm)  +  Y  H (X‘) 

i=m+l 

-  i/(Xm+1, . . . ,  Xx|Ah, . . . ,  Xm,  /(Ah, . . . ,  Xm)) 

K 

>  Jm(Xi;...;Xm)  +  ^  //(X.) 

i=m+l 

Im\X  1)  ■  ■  ■  ■  -Am) 

~t"  /(/\  — m+1)  (/  (A  1 ,  ■  •  •  ,  Xrn  ) ,  ,  ■  •  •  ,  Xft  ) , 

where  the  solitary  inequality  holds  because  conditioning  never  increases 
entropy  □ 

3.  /(Ah; . . . ;  Ah; . . . ;  X*)  >  /(Ah; . . . ;  /(X,); . . . ;  XK),  where  /(X8)  is  a  func¬ 
tion  of  the  random  variable  Xt.  This  is  the  data  processing  inequality  for 
the  multi-variable  mutual  information  and  is  a  special  case  of  Property  2. 


A.2  Proof  of  Theorem  3 

Let  Dk  <  1  —  |  and  rational.  Let  firi^J\f  and  gKr  A  C  J\f,  JC  ^  0,  be  a  code 
that  achieves  ( Rk(Dk ),  Lb, . . . ,  Dk , . . . ,  Dn).  Let  Rk(Dk)  be  the  rate  of  /),  i  e 
A/”.  Consider  endowing  the  source  with  an  uniform  distribution  over  X1 
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for  analysis  purposes.  From  the  proof  of  Theorem  7  ( cf. .  (A3)  and  (A.4))  and 
using  the  fact  that  the  worst-case  distortion  is  no  lower  than  the  average-case 
distortion,  we  obtain  Ik(fSl]  ■  ■  ■  \  fsk)  =  0. 


Let  X*s.  be  the  reconstructed  source  string  when  the  decoder  has  access  to 
the  sf1  description  only.  By  Property  3  of  the  multi-variable  mutual  information, 
4(X'S1; . . . ;  ±lJ  <  4(/sl; . . . ;  fs J  =  0  for  all  S  C  Af,  |5|  =  k.  By  Property  2  of 
the  multi-variable  mutual  information,  /(X-;  X'j  =  0  for  all  i,jeAf,i  ^  j,  and 
thus  I (Xit-,  Xjt )  =  0  for  all  i,j  e  Af,  i  4  i>  and  t—  1  Now  if  any  two  of  the 

Xg.  disagree  in  a  source  symbol  they  reveal,  then  the  resulting  single-message 
distortion  is  going  to  be  oo  and  the  result  follows  trivially,  so  suppose  that  the 
Xg.  are  consistent.  Then  by  Lemma  l,  we  have 


V"  n 


i= 1 


max 

ex1 


J2d(xt,X, 


t= i 


>71—1, 


which  implies 


D\  =  max  max 
*6 A/-  x'ev' 


^  ^  d(xtj  Xu 


t= i 


77—1  1 

>  -  =  1  - 

77  77 


This  completes  the  proof. 


A.3  Proof  of  Theorem  4 

If  R'  <  Rk(Dk ),  then  the  sum  rate  of  any  k  descriptions  is  strictly  less  than  1  —  Dk, 
and  the  source  string  cannot  be  reconstructed  with  distortion  Dk-  Thus  the  rate 
of  each  description  must  be  at  least  Rk(Dk).  Now,  in  light  of  the  previous  the¬ 
orem,  it  suffices  to  show  that  for  any  (Rk(Dk),  Di, . . . ,  Dk , . . . ,  Dn)  G  TZT>worst, 
if  Di  =  1  —  1,  then  Drn  >  1  —  ^  for  m  <  k.  Let  S  =  {si,...,Sfc}  and 
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M  =  { s  i ,  sm}.  Let  X;V1  be  the  source  reconstruction  when  the  decoder  has 
access  to  set  of  descriptions  indexed  by  the  elements  in  Ai.  Then  from  (A.4)  and 
Properties  2  and  3  of  the  multi-variable  mutual  information,  it  follows  that 

I(XlM;  Xgm+1 ; . . . ;  XlJ  <  I(XlM- /Sm+1; . . . ;  /,J 

<  4(/sl;--  -;/sJ  =  o, 


and  thus  XSm+1/, . . . ,  XSk >t)  =  0  for  t  =  1 This  implies  that  for 

each  t,  the  (n  —  m  +  1)  random  variables  {XM^  XSm+ljt; . . . ;  XSn  t\  are  pairwise 
independent,  and  therefore  by  Lemma  1, 

max 

xl£Xl 

>  n  —  m. 


YAxuX. 


M,t) 


t=  1 


X 

i=m+ 1 


max 

xl£Xl 


'y  '  d(xt,  X8i,t) 


t= 1 


Since  D i  =  1  —  we  have 


max 


^  '  d(xt,  XS;i 


t= i 


<  1  -  - 

n 


for  m  +  1  <  i  <  n,  and  thus 


max 

-nl&Xl 


i=l 


>  n  —  m  — 


max 

-Y 'I 


i=m+ 1 


^  ]  d{xti  Xs%  i 


t= i 


>  n  —  m  —  (n  —  m)  1 - 

V  n 

n  —  m  m 

n  n  ’ 


which  implies 


Dm  =  max  max 
M<zM  *}&Xl 
\M\=m 


Y,  d(xtl  XM,t 


t=  1 


m 

>  1 - . 

n 


This  completes  the  proof. 
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A.4  Proof  of  Theorem  5 


Since  m  divides  n,  we  can  form  n/m  sets  consisting  of  m  messages  each.  Denote 
these  sets  by  M i, . . . ,  Mn/m,  where  Mi  C  {f\, . . . ,  fn},  \Mi\  =  m,  and  Mi  D 
Mj  =  0,  ijj  G  (1, . . . ,  n/m},  i  ^  j.  Since  m  <  k/ 2,  there  exists  a  set  S'  = 
(si, . . . ,  Sfc}  of  k  messages  containing  Mi  and  Mj  for  some  i,j  G  (1, ,  n/m}, 
i  7^  j.  Let  X^,  be  the  source  reconstruction  when  the  decoder  has  access  to  the 
messages  in  M,  only.  By  Property  2  of  the  multi-variable  mutual  information, 
it  follows  that  for  the  set  S  containing  M,  and  Mj, 

—  I(k-2m+2)(¥±Mii  X.A/tji  fri  ■  ■  *  !  fr+k—2m—l) 

<  4(/sl;---;/sJ  =  o, 


where  fr, . . . ,  fr+k-2m- i  e  {fsi, . . . ,  fSk}  \  {Mu  Mj}.  By  Lemma  1,  we  have 


n/m 


x 


max 


1  x 

j^2d{xt,XMut) 

1  t= 1 


m 


-1, 


and  thus 


D 


m 


max  max 
McAf  xl£Xl 
\M\=m 


1  %, 

y  d(Xt’  Xm .*) 

1  t=  1 


>  max  max 

ie{l,...,n/m}  x'eA'1 


y  ^2d{xt,XMitt) 

1  t= i 


> 


n 

m 


1  - 


m 

n 


This  completes  the  proof. 
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A.5  Proof  of  Theorem  7 

The  proof  of  the  first  part  of  Theorem  7  is  simple.  Let  Dk  >  1  — No  excess  rate 
for  every  k  descriptions  implies  that  every  description  has  rate  Rk(Dk).  If  the 
decoder  receives  m  descriptions,  then  it  receives  a  sum-rate  of  mRk(Dk)  bits  per 
source  symbol.  Using  the  point-to-point  rate-distortion  function  for  a  binary 
source  with  erasure  distortion,  we  get  Dm  >  1  —  mRk(Dk). 

The  proof  of  the  second  part  of  Theorem  7  is  less  trivial.  We  begin  with  a 
lemma  which  is  similar  in  spirit  to  Lemma  1  for  worst-case  distortion. 

Lemma  5.  Let  X1, ... ,  Xn  be  erased  versions  (Definition  4)  of  a  uniform  binary  random 
variable  X  taking  values  in  {+,  — If(l  —  <  \  and  Ik{  XSl; . . . ;  XSk )  =0  W  S  = 

(si,  •  •  • ,  sfc},  S  C  J\f,\S\  =  k,  then  Y%=i  Pr(^  =  0)  >  n  -  1. 

Proof  (l  —  j-t)k  <  ^  =>  (|)s  >  1  —  U  We  have  the  following  four  cases: 

Case  I:  There  exists  i  e  M  such  that  Pr(X*  =  +)  >  0  and  Pr(A7'i  =  — )  >  0. 
Assume  i  —  1  without  loss  of  generality.  Since  Xi, . . .  ,Xn  are  erased  versions 
of  the  same  variable,  they  can  never  disagree  in  the  source  symbol  they  reveal 
(i.e.,  if  X,  =  +  for  some  i  e  J\f,  then  the  rest  cannot  be  — ,  and  if  A,  =  — ,  then 
the  rest  cannot  be +).  Thus  Pr(Ah  =  +,  X,  =  — )  =  0,  j  G  {2,  ...,n}.  Since 
4(X.S1 ; . . . ;  XSk)  =  0  for  any  set  of  k  variables  containing  X\  and  XJr  X\  and  Xj 
must  be  independent.  Thus 

Pr(Xi  =  +)  •  Pr(Xi  =  -)  =  Pr(Xi  =  +,  Xj  =  -)  =  0 

=>  Pr  (Xj  =  -)  =  0.  (A.l) 

Likewise,  Pr(Xi  =  —  ,Xj  =  +)  =  0  Pr(Xj  =  +)  =  0.  Thus  Pr(X_,  =  0)  =  1 
and  so  fffi=\  Pf(X,  =  0)  >  n  —  1. 


86 


97 


Case  II:  There  exists  i  E  Af  such  that  Pr(X,  =  +)  >  0  and  Pr(X;  =  — )  =  0,  and 
Case  I  does  not  hold. 

Let  S  =  (si, . . . ,  Sk}  be  a  siz e-k  subset  of  A f.  For  all  T  C  S,  denote  by  Ej-  the 
event  that  Xs.  =  —  V  sj  E  T,  and  Xs.  —  0  V  s3  </  T,  Sj  E  S.  Now  since 
Pr(XSj  =  -)  =  0  from  (A.l),  Pr(£r)  =  0  V  T  ±  0.  Thus 

Pr(X  =  -)  <  ^Pr(T;r) 

TcS 

=  Pr(XSl  =  XS2  =  . . .  =  XSk  =  0).  (A.2) 

Since  Pr(X  =  — )  =  1/2  and  (XS1, . . . ,  XSk )  are  independent,  (A.2)  yields 

k 

n Pr(JfSi  =  0)  =  Pr(X„  =Xn  =  ...  =  X,k  =  0)  > 

3  =  1 

In  order  to  lower  bound  ]C”=1  Pr(A *  =  0),  we  solve 
min  ]C"=i  Pr(^i  =  °) 

s.t.  IIj-i  Pr(A.S)  =  0)  >  ^  V  S  =  {si,  •  •  • ,  sk}  C  A/”. 

This  is  a  convex  optimization  problem,  as  can  be  readily  seen  by  substituting 
aj  =  log  Pr(Xj  =  0),  and  can  therefore  be  solved  by  choosing  Pr(Aj  =  0)  =  (|) k 
for  j  =  1, ...  ,n.  Thus  V'.'  ,  Pr(Xj  =  0)  >  n  (|)  k  >  n(  1  —  1/n )  —  n  —  1. 

Case  III:  There  exists  i  E  Af  such  that  Pr(Xj  =  — )  >  0  and  Pr(A,:  =  +)  =  0,  and 
Case  I  does  not  hold. 

This  case  is  symmetric  to  Case  II. 

Case  IV:  For  all  i  E  Af,  Pr(X;  =  +)  =  Pr(X;  =  -)  =  0. 

We  have  £?=1  Pr(Aj  =  0)  >  ^?=2  Pr(Aj  =  0)  =  n  -  1.  □ 

We  are  now  in  a  position  to  prove  the  second  part  of  Theorem  7.  Let  Dk  < 
1  —  Dk  rational,  and  (l  —  ^)k  <  and  let  /„  i  E  Af  and  gjc,  /C  C  A,  L  /  0  be 
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a  code  that  achieves  the  rate-distortion  vector  (Rk(Dk),  D1, . . . ,  Dk, . . . ,  Dn ).  Let 
fi,  i  G  N  have  rate  Rk{Dk).  We  have 

lRk(Dk)>H(fi),ie  M.  (A.3) 

Let  X|s  be  the  reconstruction  when  the  source  X;  is  reconstructed  from  a  set 
S  of  descriptions.  Since  Dk  is  finite,  the  decoder  cannot  make  errors  in  its 
reconstruction  (which  would  incur  infinite  distortion).  Thus  X(s  must  be  an 
erased  version  of  X;,  i.e.,  for  all  t  e  {1, . . . ,  Xs,t  =  Xt  or  X$,t  =  0.  Then 
V  S  —  {si, . . . ,  Sfc}  C  Af,  | S' |  —  k,  we  have 

H(fSl...fSk)>H(±ls) 

>/(X';Xi,) 

=  H(Xl)  -  H(Xl\±ls) 

i 

=  l~YJH{Xt\±ls,Xl,...,Xt_l) 

t=  i 
i 

t= i 
i 

=  H(Xt\Xs,t  =  0)  •  Pr  (XStt  =  0) 

t=  i 
i 

=  /-^Pr(X5,t  =  0) 

t=  l 

'  i 

=  /  -  E  ^2 

_t= i 

>Z-ZDfc  =  Z(l:-X>fc).  (A.4) 

Thus 

k 

HU;-  =  X  •••/«) 

j=l 

<  klRk(Dk )  —  /(I  —  Dk)  =  0. 
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Let  X(,.  be  the  reconstruction  when  the  decoder  receives  the  sf1  descrip¬ 
tion  only.  Then  4(X^; . . . ;  X'J  <  4(/Sl; . . . ;  fSk)  =  0  (Property  3)  and  so 
h{Xai,t\ •  •  • ;  XSk:t)  =  0,  f  G  {1, By  Lemma  5,  Y%=  i  Pr(Wt  =  0)  >  (n  -  1) 
for  t  G  {1, . . . ,  Thus 

-  I  n 

tXX  Pr(X„  =  0)  >  n  —  1 

1  t= i  i=i 

=>-  max  ^  Pr(X?;i  =  0)^  >  1  -  y . 

This  completes  the  proof. 


A.6  Proof  of  Theorem  8 


We  establish  two  lemmas  before  proving  Theorem  8. 


Lemma  6.  Let  X1}  X2,  and  X3  be  Bernoulli  random  variables  such  that  I (X,:  Xj)  =  0, 
Vi,j  G  (l,2,3},i  f  j,  and  Pr(Ah  =  X2  =  X3  —  0)  >  Let  p  =  max(Pr(Ah  = 
0),Pr(X2  =  0)).  Then 


Pr(X3 


0)> 


1  p(l-p) 

2  +  2/j-l 


Proof.  If  p  =  1,  then  the  conclusion  follows  directly  from  the  hypothesis,  so 
suppose  that  p  <  1.  Let  p{  denote  Pr (Xj  =  0),  p(x \,x2,x3)  denote  Pr(Ah  = 

xi,X-2  =  x2,X3  =  x3),  and  pX3\Xl,X2  denote  Pr(X3  =  x3\X3  =  xi,X2  =  x2). 
Let  g0  =  Po|o,o,  (1\  =  Po|o,i,  and  g2  =  p0|i,i-  We  thus  have  p(0,0,0)  =  pip2q0r 
p(0, 1,0)  =  pi(l  -  P2)qi,  and  p(l,  1, 0)  =  (1  —  Pi)(l  -^2)92-  Then 

Pr(Ah  =  0,  X3  =  0)  =  p(0, 0,  0)  +  p(0, 1,  0) 

=  Pi(p2qo  +  (1  -  P2)qi)  (A.5) 
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Pt(X2  =  1,X3  =  0)  =  p(0,l,0)+p(l,l,0) 

=  (l -p2)(pi?i  +  (i -Pi)g2)-  (A.6) 

Since  (X3,  X3)  and  (X2,  X3)  are  pairwise  independent,  we  have,  from  (A.5)  and 
(A.6), 


Pr(Ai  =  0,  X3  =  0)  =  pip3  =  pi(p2q0  +  (1  -  P2)<?n 
^  P3  =  P2?0  +  (1  -P2)?l, 

Pr(X2  =  1,  X3  =  0)  =  (1  -  p2)p3 

=  (1  ~p2){piqi  +  (1  ~Pi)q2) 

^P3  =  Piqi  +  (1  ~  Pi)q2. 


(A  .7) 


(A.8) 


From  (A .7)  and  (A.8), 


PNi  +  (1  -  px)q2  =  p2qQ  +  [l  ~  p2)qi 

P-iqo  -  (pi  +P2  ~  l)gi 


=>■  q-2  = 


i  -  p\ 


(A.9) 


Since  p(0, 0,0)  >  l/2by  hypothesis,  we  have  p\p2  >  1/2,  and  thus  p3  +p2  —  1  >  0. 
Now  since  q2  <  1,  (A.9)  gives 


i  .  p2<?o  -  (pi  +  p2  -  l)f/i  .  p2<?o  -  (1  -  Pi) 
I  - : -  =>  q  l 


1-Pl 


Pi  +  p2  - 1 


(A.10) 


Now 


1  1 

p(0,  0,  0)  =  Pip2q0  >  -  =>  p2q0  > 


(A.ll) 


Assume  without  loss  of  generality  that  p,  >  p2.  Then  pi  +p2  <  2 p1.  Substituting 
this  and  (A.ll)  into  (A.  10)  yields 


qi  > 


Pi 


2pi  -  1  2pi  —  1  2pi 

Upon  substituting  (A.ll)  and  (A.12)  into  (A .7),  we  get 

P 3  >  ^-  +  (1  -P2)(yEH~Y- 
2pi  \2Pi  -  1  2 p! 


(A.12) 
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>  — +  (1-Pl) 


2pi 

1  +  P^1  ' 

2  2Pl 


Pi 


2pi  —  1 


1 

2pi 


Pi) 


where  the  last  inequality  follows  because  p2  <  Pi  and 


pi 


9  1  —  >  0. 

2pi-l  2pi 


□ 


Corollary  1.  Let  A, ,  X2,  X3  and  X4  be  Bernoulli  random  variables  such  that 
I(Xi,Xj)  =  0,  Vi,  j  G  {1,2,3, 4}, i  f  j,  and  Pr(Ah  =  X2  =  X3  =  X4  —  0)  > 
Then 

4 

^Pr(X,  =  0)>3. 

2=1 


Proof.  Let  Pi  =  Pr(Xj  =  0).  Assume  WLOG  that  pi  >  p2  >  p3  >  p4.  Now 
p3p4  =  Pr(X3  =  X4  =  0)  >  1/2  by  hypothesis,  which  implies  p3  >  l/\/2  and 
p4  >  1/2 p3.  Applying  Lemma  6  to  X2,  X3r  and  X4  gives  p2  >  \  +  p'f .  Thus 

4 

^  Pi  =  Pi  +  P2  +  P3  +  P4 
2=1 


>  2p2  +  p3  +  P4 


>  min  ' 


1 

2x  ’ 


Since  {  +  is  monotonically  decreasing  in  p3  for  p3  e  (1/2, 1],  it  is  easy  to 

verify  that 


,  1  x(l  —  x)' 

max  l  2 + 


a; 


if  x  >  l-(-  — 1 — 
11  x  -  2  ^  yi2 


1  x(l-x)  <  1  ,  _L_ 

2  ^  2x-l  UJ'-2t  y/12’ 


where  \  +  is  the  admissible  solution  to  the  equation  x  —  |  Thus 


> 


2=1 


mm  mm  ^  , 

x6[_l  i _i _ i_i  V  2 

XizlV2’2^  Vl2 


1  x0-£) 

2x-l 


+  X+2l’ 
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=  mm 


mm 

rcfJ-  i-i — m 
Sfclv/2’2  +  v/l2- 


min  2x  +  x  H - 

*e[§+^3,i]  2a: 

1  x 

+  2x  +  2a:-  1’ 


min  3a:  H - 


=  min(3,  3)  =  3, 


where  the  penultimate  equality  follows  from  the  fact  that  1  +  ^  +  -^f—x  is  a 
monotonically  decreasing  in  x  for  x  G  [4^,  \  +  -f=\  and  takes  a  minimum  value 
of  3  at  x  =  |  +  h=,  and  that  3a:  +  ^  is  monotonically  increasing  in  x  for  x  G 
+  h=,  1]  and  takes  a  minimum  value  of  3  at  a:  =  ^  +  h=.  □ 


The  following  lemma  is  similar  to  Lemma  5,  but  is  adapted  to  the  n  =  4, 
k  —  2  case,  which  is  not  covered  by  Lemma  5.  Lemma  5  requires  that  n  and  k 
satisfy  the  inequality  (l  —  <  which  is  violated  when  n  —  4  and  k  =  2. 

Indeed,  much  of  the  following  proof  is  similar  to  that  of  Lemma  5,  except  for 
Cases  II  and  III,  where  we  use  Corollary  1  to  bypass  the  condition  (l  —  i)fe  <  I 
which  is  needed  in  Case  II  of  the  proof  of  Lemma  5. 

Lemma  7.  Let  Ah, ,  X4  be  erased  versions  of  a  uniform  binary  random  variable  X 
taking  values  in  {+,—}•  If  I  (X,  \  Xf)  =  0,  i,  j  e  {1, . . . ,  4},  i  f  j,  then 

4 

^Pr(X,  =  0)  >3. 

i= 1 

Proof  The  proof  is  very  similar  to  that  of  Lemma  5,  so  we  only  summarize  the 
argument  here. 

Case  I:  There  exists  i  G  {1,  2,  3, 4}  such  that  Pr(Aj  =  +)  >  0  and  Pr(Ah  =  — )  >  0. 
Just  as  in  the  proof  of  Lemma  5,  we  have  =  0)  >  4  —  1  =  3. 

Case  II:  There  exists  i  G  (1,  2,  3,4}  such  that  Pr(Ah  =  +)  >  0  and  Pr(A*  =  — )  = 
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0,  and  Case  I  does  not  hold. 

Assume  i  —  1  WLOG.  Then  from  (A.l),  Pr(Xj  =  — )  =  0  for  j  G  {2,  3, 4}.  Thus 
the  Xj  are  effectively  binary  random  variables  such  that  Pr(Ah  —  ...  —  X4  — 
0)  >  1/2.  By  Corollary  1,  Y^)=i  Pr(X?  =  0)  >  3. 

Case  III:  There  exists  i  G  {1, 2,  3, 4}  such  that  Pr(Xj  =  — )  >  0  and  Pr(Xj  =  +)  = 
0,  and  Case  I  does  not  hold. 

This  case  is  analogous  to  Case  II. 

Case  IV:  For  all  i  G  {1,  2,  3, 4},  Pr(Xj  =  +)  =  Pr(A,:  =  — )  =  0. 

We  have  £■=,  Pr(Xy  =  0)  >  V;  2  Pr(Xj  =  0)  =  4  -  1  =  3.  □ 

We  are  now  in  a  position  to  prove  Theorem  8.  Let  firi<EAL  and  g /CCA/” 
be  a  code  that  achieves  -Di,  H2,  -D3,  D4).  Using  the  same  argument  as  that 

in  the  proof  of  the  second  part  of  Theorem  7,  we  have  for  i,  j  G  {1,  2,  3, 4},  /  7^  j 
that  /(X-;  Xy)  <  /(/g  /j)  =  0  and  thus  I(Xlt\  Xjt)  =  0  for  all  t  G  {1, . . . ,  /}.  By 
Lemma  9,  Pr(Xit  =  0)  >  3  for  t  G  {1, . . . ,  It  follows  that 

tXX  Pr(Xjt  =  0)  >  3 

1  t= 1  i= 1 

=>  max  Pr(X*«  =  °)  j  >  \- 

This  completes  the  proof. 


A.7  Proof  of  Theorem  9 

We  establish  two  lemmas  before  proving  Theorem  9. 

Lemma  8.  Let  Ah, . . . ,  Xn  be  Bernoulli  random  variables  such  that  /(Ah:  X/)  =  0 
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V  i,j  e  AT,  if  j,  and  Pr(X,  —  X2  —  ...  —  Xn  —  0)  >  \.  Then 


1  "  2 


Proof.  Let  pr  denote  Pr(X,:  =  0)  and  let  q,  =  Pr(X,  =  1)  =  1  —  p.,.  Since  the  X/s 
are  pairwise  independent,  we  have 


E 


Var 


E*< 

i= 1 
n 

E-Y< 


i—  1 


l 

n 

1 

n? 


n 

Y* 

i= 1 
n 


PiQi 


i= 1 


i= 1 


Let  a  >  J  i  Pi(li)-  Then,  by  Chebyshev's  inequality. 


Pr 


1  n  1  n 

-EX<-~E«‘ 

n  n 

i=  1  i=l 


>  a 


< 


Var  P  ,  Xd 


a* 


En 

i=iPiQi 


n2a 2  2 


Let  Ei  and  E->  be  the  events  \^Yh=i  X*  ~  n  5d”=i  di\  <  oi  and  X,  =  X2  =  ...  = 
Xn  =  0,  respectively  Then  Prl  Ej )  >  b  and  Pr(E2)  >  |  by  hypothesis.  Since 
Pt(Ei)  +  Pr(E2)  >  1,  Pr(E1  D  E2)  >  0.  This  implies  that 

^  n  i  n 

~Y  Qi  <  OL  =>  ~Y  Pi  >  1-  a. 

i—  1  i=l 

Since  a  was  arbitrary,  this  implies 


Moreover, 


1 

n 


Y^  ~  1_ 


(A.13) 


1 

n 


n 

Y m%  - 

i—  1 


A  little  algebra  gives 


n 

Y  mi  - 

i=  1 


n 


n 


2  ^  ^  piqi  => 
i= 1 


Ym> 

i= 1 


<  2. 


(A.14) 
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Substituting  (A.14)  into  (A.13)  yields 


□ 

Lemma  9.  Let  Ad, ... ,  Xn  be  erased  versions  of  a  uniform  binary  random  variable  X 
taking  values  in  {+,—}.  If  I ( Ad;  Xf)  —  0,  i,j  e  J\f,  i  f  j,  then 

n 

Pr (Xi  =  0)  >  n  —  2. 

1=1 

Proof  We  have  Cases  I,  II,  III,  and  IV  as  in  the  proof  of  Lemma  5.  Cases  I  and  IV 
are  the  same  as  those  in  Lemma  5,  so  we  will  only  mention  Cases  II  and  III. 
Case  II:  There  exists  i  G  Af  such  that  Pr(Xj  =  +)  >  0  and  Pr(Ad  =  — )  =  0  and 
Case  I  does  not  hold. 

Assume  i  —  1  WLOG.  Then  from  (A.l),  Pr (Xj  =  — )  =  0  for  j  E  {2, ... ,  n).  Thus 
the  Xf  s  are  always  erased  when  the  binary  source  X  =  — ,  and  so  Pr(Ad  =  . . .  = 
Xn  =  0)  >  1/2.  By  Lemma  8,  Pr(Xy  =  0)  >  n  —  2.  The  proof  of  Case  III  is 
analogous  to  the  proof  of  Case  II.  □ 

We  are  now  in  a  position  to  prove  Theorem  9.  Let  /„  i  E  M  and  g/c,  /C  C  J\f 
be  a  code  that  achieves  ( lLf  2 ,  DX)  D2: . . . ,  Dn).  Using  the  same  argument  as  that 
in  the  proof  of  the  second  part  of  Theorem  7,  we  have  for  i,j  e  A f,  i  f  j  that 
/(X-;  X.lj)  <  I(fi,  fj )  =  0  and  thus  /( Xit;  Xjt)  =  0  for  t  G  {1, . . . ,  /}.  By  Lemma  9, 
ffji=\  Pr(Ait  =  0)  >  n  —  2  for  t  G  {1, . . . ,  It  follows  that 

^  l  n 

y£E  Pr(W,  =  0)  >  n  —  2. 

L  t=  1  i=  1 

7£Pr«-=0>)  ^ 

t= 1  / 

This  completes  the  proof. 
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A.8  A  Random  Coding  Proof  of  Theorem  6 

Like  the  MDS  coding  scheme  for  worst-case  distortion,  the  random  coding 
scheme  consists  of  two  parts  -  uncoded  bits  and  an  random  binning  component. 
The  uncoded  component  is  similar  to  the  uncoded  component  of  the  MDS  cod¬ 
ing  scheme.  The  difference  lies  in  the  encoded  component;  instead  of  encoding 
an  erased  version  using  an  (n,  k)  systematic  MDS  code,  the  average-case  dis¬ 
tortion  encoder  randomly  bins  an  erased  version  of  the  source  and  then  sends 
bin  indices  to  the  decoder.  The  decoder  outputs  the  uncoded  bits  as  the  source 
reconstruction  if  less  than  k  descriptions  are  received.  If  k  or  more  descriptions 
are  received,  the  decoder  uses  the  uncoded  bits  and  the  bin  indices  to  decode  the 
encoded  erased  version  using  typicality  considerations.  A  formal  description  of 
the  scheme  follows. 

Case  I:  Dk  >  1  —  - 

Assume  without  loss  of  generality  that  Dk  is  rational  (if  Dk  is  irrational,  then 
we  can  prove  achievability  for  a  sequence  of  rational  distortions  in  [1  —  k/n ,  1] 
converging  to  Dk  and  take  limits).  Then  there  exists  a  positive  integer  /'  such 
that  l'Rk(Dk )  is  a  positive  integer.  Choose  a  blocklength  l  =  anV ,  where  a  is 
any  positive  integer.  Observe  a  length-/  source  sequence  X1 ,  and  divide  X1  into 
n  disjoint  parts  such  that  each  part  contains  l/n  =  al'  bits.  (The  division  is  the 
same  regardless  of  the  source  realization.)  Label  the  parts  X,,  i  e  AT.  Choose 
lRk{Dk )  bits  from  each  of  the  n  parts  (since  Dk  >  1  —  lRk(Dk )  <  £  and 
therefore  lRk(Dk)  bits  can  be  chosen  from  each  part).  Denote  by  Y,  the  set  of 
lRk(Dk)  bits  chosen  from  X,.  Transmit  Y,  uncoded  over  the  ilh  channel. 

The  decoding  is  trivial.  If  m  descriptions,  say  (Yi, . . . ,  Ym),  are  received. 
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output  as  the  reconstruction  of  X',  where  X^  is  such  that  the  mlRk(Dk )  bits 
corresponding  to  (Yi, . . . ,  Ym)  are  non-erased  and  the  other  (Z  —  mlRk(Dk ))  bits 
are  erasures.  The  distortion,  therefore,  is  ( l  —  mlRk(Dk))/l  =  1  —  mRk(Dk).  When 
k  descriptions  are  received,  the  distortion  is  1  —  kRk(Dk)  =  Dk.  Thus  R  G  lZVavg, 
and  therefore  also  lies  in  'R'Davg. 

Case  II:  Dk  <  1  —  - 

The  scheme  for  this  case  is  an  extension  of  the  scheme  for  Case  I.  It  has  two  com¬ 
ponents;  random  binning  and  transmission  of  uncoded  source  bits.  An  erased 
version  of  every  source  sequence  is  binned  separately  at  each  encoder.  The  ob¬ 
served  source  string  is  divided  into  n  disjoint  parts.  Each  uncoded  part  is  then 
sent  on  one  of  the  n  channels  along  with  the  corresponding  bin  index  of  the 
erased  version  of  the  source.  If  less  than  k  descriptions  are  received,  the  de¬ 
coder  outputs  a  partial  reconstruction  based  solely  on  the  uncoded  parts;  if  k  or 
more  descriptions  are  received,  the  decoder  outputs  a  reconstruction  based  on 
the  uncoded  parts  and  the  bin  indices. 

Assume  again  that  Dk  is  rational.  Choose  e  >  0,  and  define  R'  =  Rk(Dk)  — 
l/n+e.  Since  Dk  is  rational,  there  exists  a  positive  integer  V  such  that  l'Dk/(n—k) 
is  an  integer.  Choose  a  blocklength  /  =  cm/',  where  a  is  any  positive  integer. 

Random  binning:  Construct  n  sets  of  bins  such  that  every  set  contains  2lR' 
bins.  For  every  length-/  source  string  x.1  e  X1,  construct  an  erased  version  as 
follows.  Divide  x/  into  n  disjoint  parts  such  that  each  part  contains  l/n  =  aV 
bits  (the  division  is  done  identically  for  all  source  sequences).  For  each  part, 
replace  the  last  lDk/ (n  —  k)  bits  by  erasures  (since  Dk  <  1  —  -,  each  part  contains 
l/n  >  lDk/{n  —  k )  bits).  Assign  the  resulting  erased  version  xj  uniformly  at 
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random,  and  independently  from  other  strings,  to  one  of  the  2lR'  bins  in  the  ith 
set,  for  all  i  e  JV.  The  assignment  is  done  only  once  for  each  erased  version.  This 
is  important  because  multiple  source  strings  can  have  the  same  erased  version. 
Denote  the  assignments  by  Tj. 

Encoding:  Let  X;  be  the  observed  source  sequence.  Divide  X;  into  n  disjoint 
parts  each  containing  l /n  bits  as  described  above.  Label  the  parts  X,,  i  <G  J\f.  Let 
Bi  =  ri(XJ)  be  the  index  of  the  bin  containing  the  erased  version  of  X(  in  the  ith 
bin  set.  Transmit  (X*,  B, )  over  the  ith  channel. 

Decoding:  If  m  descriptions,  say  {(Xi,  B\), . . . ,  (Xm,  Bm)},  are  received, 
where  m  <  k,  output  X^  as  the  reconstruction  of  X',  where  Xzm 
is  such  that  the  ml/n  bits  corresponding  to  (Xi,...,Xm)  are  non-erased 
and  the  other  (l  —  ml/n )  bits  are  erasures.  If  m  >  k  descriptions 

are  received,  say  {(Xi ,B±)t _ ,(Xm,Bm)},  choose  any  k  descriptions,  say 

{(Xx,  Bi), . . . ,  (Xfc,  Bk)},  and  search  the  bins  (B1, . . . ,  Bk)  for  a  sequence  Y  such 
that  Tj(Y)  =  Bir  i  =  1 , ,k,  and  Y  is  consistent  with  the  partially  revealed 
source  string  (X1; . . . ,  Xfe).  Output  X^  =  {(Xi, . . . ,  Xm)}  U  {Y}  as  the  re¬ 
construction  of  X*.  (Thus  the  non-erased  bits  in  Xzm  are  the  bits  revealed  by 
(Xi, . . . ,  Xm)  or  by  the  erased  version  Y,  or  both.)  There  is  guaranteed  to  be  at 
least  one  such  sequence  Y  in  the  bins  indexed  by  Bi, ... ,  Bk.  If  there  is  more 
than  one  such  sequence,  output  the  non-erased  portion  (Xi, . . . ,  Xm)  as  the  re¬ 
construction  of  X;. 

Error  analysis:  We  say  an  error  E$  has  occurred  at  the  decoder  if,  for  a  set 
S  =  {si, . . . ,  Sfc}  of  k  descriptions,  there  exists  an  erased  version  Y  ^  Xef  such 
that  TS.(Y)  =  Ts.(Xe')  for  all  Sj  G  S  and  Y  is  consistent  with  (XS1, . . . ,  XSk).  Let 
Cs  be  the  set  of  erased  versions  that  are  consistent  with  (XSl, ....  XSk).  Define 
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E  =  (J5  |5|=fe  Es.  We  bound  Pr(f?)  as  follows. 

Pr  (E) 

<  5]  Pt{Es) 

<S,|<S|=fc 

=  Y,  Pr(3Y^Xei,YGC5:rSi(Y)  =  rSi(Xei) 

<S,|<S|=fc 

Vsj  E  S) 

=  5Zp(x')  Pr(3Y  ^xel,Y  ECS: 

X*  5,|5|=fe 

rSi(Y)  =  rSi(xe^)Vs*  e  <S|x*  =  xl) 

<  Yp^  Y1  Y1  Pr(r^(y)  =  r^(xe') 

X*  S,\S\=ky^J 

y  ec5 

\/Si  G  S\Xl  =  xl) 


<5>(*‘)  X  2-“K'|Ci| 

x;  S,\S\=k 

=  Yp(x-1)  X]  2~fcZ(1^_™+e)  •  2(n_fc)("_z^) 

x*  S,\S\=k 

=  Xp<x')  X 

X*  S,\S\=k 


We  now  show  that  for  any  e  >  0,  the  (n  +  l)-tuple  ( Rk(Dk )  +  e,  1  —  ^  +  e,  1  —  3  + 
e>  •  •  • )  1  —  v  +  Dk+e,  (nn-j it1)Dk+c,  (nnk_k‘d)Dk  +  e, . . . ,  (^Zfc)P)fc  +  e,  e)  is  achiev¬ 
able,  and  thus  R  €  VSDavg.  Fix  e  >  0  and  define  R'  as  above.  In  our  scheme,  any 
description  (X*,  13,)  has  rate  R  =  l/n  +  R' ,  where  1  fn  is  the  rate  due  to  X,  and 
R'  is  the  rate  due  to  binning.  Thus  R  —  l/n  +  (Rk(Dk)  —  l/n  +  e)  =  Rk(Dk )  +  e. 
Moreover,  if  m  <  k  descriptions  are  received,  the  decoder  outputs  ml /n  bits  as 
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revealed  by  the  m  descriptions  and  the  other  (/  —  ml/n)  bits  as  erasures.  Thus 
Dm  =  1  —  m/n  <  l  —  m/n  +  e.  If  A;  descriptions  are  received,  say  5  =  {si, . . . ,  Sfc}, 
the  decoder  either  outputs  an  erased  version  of  the  correct  source  sequence  if 
Eg  occurs,  or  outputs  (XS1, . . . ,  XSk)  if  Es  occurs.  If  E's  occurs,  then  the  decoder 
receives  kl /n  bits  uncoded  from  the  k  descriptions,  and  is  able  to  figure  out  a 
further  (n  —  k)(l/n  —  lDk/ (n  —  k))  =  1(1  —  k/n  —  Dk)  bits  by  using  the  bin  indices 
to  decode  the  erased  version  of  the  source  sequence.  Hence  the  maximum  per- 
letter  distortion  over  sets  of  k  descriptions  is  1  —  (k/n  +  1  —  k/n  —  Dk)  =  Dk  if 
Ec  occurs,  and  L  —  k/n  if  E  occurs.  Let  c/5  x  be  the  per-letter  distortion  achieved 
using  the  set  S  of  descriptions  if  the  observed  source  string  is  x(.  Thus 


which  can  be  made  smaller  than  Dk  +  e  by  letting  a  — >  oo.  Thus  Dk  +  e  is 
achievable  for  some  sufficiently  large  /.  If  rn  >  k  descriptions  are  received, 
then  the  decoder  receives  ml/n  bits  uncoded,  and  is  able  to  figure  out  a  further 
(n  —  m)(l/n  —  lDk/(n  —  k ))  bits  by  decoding  the  binned  erased  version.  Thus, 
if  Ec  occurs,  the  maximum  per-letter  distortion  is  1  —  m/n  —  ( (n  —  m)/n  —  (n  — 
m)Dk/(n  —  k ))  =  (7j/zj/)Dk/  and  by  the  same  analysis  as  above,  a  distortion  of 
(  '/////)  k)k  +  e  can  be  achieved  for  some  sufficiently  large  l.  This  completes  the 
proof. 
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A.9  Proof  of  Lemma  1 

For  any  t  £  {1, . . . ,  we  have  exactly  one  of  the  following  four  cases: 

Case  I:  3  i  e  M  s.t.  Pr(Xit(X)  =  +)  >  0  and  Pi(Xit(X)  =  -)  >  0. 

Case  II:  3  i  e  J\f  s.t.  Pr(Xit(X)  =  +)  >  0  and  Pr(Xjf(A^)  =  — )  =  0,  and  Case  I 
does  not  hold. 

Case  III:  3  i  e  AT  s.t.  Pr(Xit(X)  =  — )  >  0  and  Pi{Xlt(X)  =  +)  =  0,  and  Case  I 
does  not  hold. 

Case  IV:  Vi  G  Af,  Pr{Xit{X)  =  +)  =  Pr{Xit{X)  =  -)  =  0. 

Let  B\,  B-2,  £>3  and  84  be  the  sets  of  f  G  {1,...,/}  satisfying  Cases  I,  II,  III  and 
IV,  respectively.  Moreover,  let  \B\  =  b\,  \B2\  =  b2/  \ B:i  =  b3  and  \B4\  —  64.  Then 
&i  +  b-2  +  63  +  64  =  l.  Now  consider  a  source  string  (x*);  such  that  x*  =  —  if  t  e  B2 
and  x*t  —  +  if  t  e  B?).  We  have 

0*0) 

n  ^  l 

>  ^2j^2d(xt,Xit(x*)) 
i= 1  1  t= 1 

1  n  1  n 

=  7  + y 

£E#i  i=l  tG/32  i=l 

1  n  1  n 

+  y  A^(x*))  +  y  d(xt  >  Xn(x*))- 

iS03  i=l  iS04  i=  1 

Consider  now  t  e  £>1.  Since  VU(X), . . . ,  X?)i(AT)  are  erased  versions  of  the  same 
binary  random  variable  Xt,  they  can  never  disagree  in  the  source  symbol  they 
reveal.  We  therefore  have  Pr(Xit(X)  =  +,  Xjt(X)  —  — )  —  0,  j  e  Af,  j  ^  i.  Since 
Xit(X)  and  Xjt(X),  i,j  e  A/”,  i  ^  j,  are  pairwise  independent,  we  have 

Pr(Wt(X)  =  +)  •  Pr{Xjt{X)  =  -) 

=  Pr(Xit(X)  =  +,Xjt(X)  =  -)  =  0 


max  d(xf.  X 

x'e*'  ^  l  ^ 
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=*  P*(Xjt(X)  =  -)  =  0, 


(A.15) 


since  Pr(Xit(X)  =  +)  >  0.  Repeating  the  same  analysis  with  Pv(Xit(X)  = 
-,Xjt(X)  =  +)  yields  Pr(Xjt(X)  =  +)  =  0.  Thus  Pr(Xjt(X)  =  0)  =  1  for  all 
j  e  Jf,  j  ^  i,  and  therefore  X]t(x*)  =  0  for  all  j  e  Af,  j  ^  i.  Similarly,  it 
follows  from  (A.15)  that  Pr (Xjt(X)  =  — )  =  0  for  j  e  Af,  j  ^  i  if  t  E  £>2  and 
Pr (Xjt(X)  =  +)  =  0  for  j  e  Af,  j  ^  i  if  t  e  £>3.  Thus  by  construction,  X\(x*), 
i  e  Ay  must  have  Xlt(x*)  =  0  for  I:  e  B2  U  B.>  U  B4.  It  follows  that 

1  .  ^ 

-j^2d(xt,Xit(x)) 


max  > 

i=i 


EE  1  (*«(*•)=<>) 

tsBi  i=l 


1 

7 


EE  1(Xit(x*)=0) 


^  n  i  n 

+  yEE  EE  1(Xit(;r*)=0) 

££03  *=1  t£04  i=l 

>  yMn  -  1)  +  y&2n  +  y&3n  +  yk4n 
=  yM-fei) 

h  >  1 

=  n - -  >  n  —  1. 


This  completes  the  proof. 
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APPENDIX  B 

CHAPTER  3:  PROOFS 


B.l  Proof  of  Theorem  13 


This  bound  differs  only  slightly  from  the  outer  bound  proposed  in  [44]  and 
much  of  the  proof  is  similar  to  that  in  [44],  Suppose  (R,  D)  is  achievable.  Let 
f[l\  . . . ,  fn '  be  encoders  and  (g3^)1,  /C  C  Abe  decoders  satisfying  (3.3).  Take  any 
Z  in  -0  and  augment  the  sample  space  to  include  Zl  so  that  (Zt,  Y0jt,  Y Yn+\,t) 
is  independent  over  t  G  {1,...,/}.  Next  let  T  be  uniformly  distributed  over 
(1, . . . ,  /}  and  independent  of  Zl ,  Yq,  Y'a-  and  Yr{+1.  Then  define 


Z  —  Zj- 
lb  =  Y0>t 

Yi  =  YitT  for  i  G  M 
Yn-\- 1  lrH-l,T 

Vi  =  (/®(7),  Z1:T_,,  {r!+1}\{X+i,r})  for  *  6  U 
Vj  =  Vj  T  for  j  =  1, . . . ,  J 
W  =  ({Zl}\{ZT},{Y<+1}\{Yn+hT}). 

It  can  be  verified  that  7  =  (U/v,  Vi,  •  •  • ,  Vj,  W,  T)  is  in  ro  and  that,  together  with 
Y0/  Y j[,  Yn+ 1,  and  Z,  it  satisfies  the  Markov  coupling.  It  suffices  to  show  that 
(R.  D)  is  in  TZV0(Z,  7).  Note  that  (3.3)  implies,  for  j  —  1, . . . ,  J, 


Dk.j  >  max  E[dj(Y0yT,Y/c,T,Yn+i^T,Vj'T)\, 


K:\K\=k 


i.e., 


Dk,j  —  max  E [g0 (3 0,  kn+i ,  Vj )  . 


K:\JC\=k 
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Second,  by  the  cardinality  bound  on  entropy  and  the  fact  that  conditioning 
never  increases  entropy. 


l^Ri>H 

ieK 


(B.l) 


By  the  chain  rule  for  mutual  information. 


The  rest  of  the  proof  is  similar  to  that  in  [44],  The  main  difference  between  this 
proof  and  the  proof  in  [44]  is  that  here  we  do  not  condition  on  (/j(y/))  ^  in 
(B.l).  Taking  the  maximum  over  this  bound  and  the  bound  in  [44]  yields  the 
desired  outer  bound. 


B.2  Proof  of  Lemma  2 


Assume  WLOG  that  K,  =  (1, . . . ,  m).  For  each  possible  realization  (w,t)  of 
(W,T),  let 

DWit  =  E[d\X,XK)\W  =  w,T  =  t}. 


Let  S  =  {(w,  t)  :  DW)t  <  \/A}.  Then  by  Markov's  inequality. 


Pr((W,T)  ?S)<-j=<S. 

In  particular,  Pr ((FT,  T)  e  S)  >  0.  Also,  for  any  (w,  t )  e  S, 

32  m  (  2Dwt\1/m 

<  o. 


(B.2) 


p{l-p)  V  a 
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Thus,  by  Lemma  6  in  [44],  if  (w,  t)  G  S, 


m 


Y I{Yi-Ui\X,W  =  w,T  =  t ) 


2=1 


>g((Dw,t  +  6)1/m)+25\og6-. 

By  averaging  over  (w,  t)  G  S  and  invoking  Corollary  1  in  [44],  we  obtain 


(ui,t)sS  i=  1 


Pr((W,T)  G  S) 


>g((D  +  8)1/m)  +  2S\og5-. 

5 


Therefore,  £  YZi  I{Y>- ui\x >  w  - T) 


> 

> 


+  tf)1/”1)  +  25  log  ^ 
5 

g((D  +  6)1'm)  +  26  log^ 

5 


■Pr((W,T)  G  S) 
(1  -6) 


=  g((D  +  ttD,S))1/m) 


for  some  continuous  £  >  0  satisfying  £(D,  0)  =  0.  It  follows  from  this  and  con¬ 
straint  (iii)  of  the  lemma  that  g(D 1/m)  >  g((D  +  £(£>,  S))1/™).  From  the  mono¬ 
tonicity  of  g(Dl/m)  in  D  (Corollary  1  in  [44]),  we  obtain  D  +  £(D,  5)  >  D.  Thus 
D  >  D  —  £(D,  5),  completing  the  proof. 
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